Currently, only language sets / keyboards that work with the ascii character set will

ioHub key logging module is not unicode compliant on Windows,about isolver/iohub

Comments (6)

peircej commented on June 24, 2024

I'm not sure about /replace/ with a unicode version. The idea would be to know what key (by name) is pressed and also which character it represents. e.g. I could be presssing 'e' but using modifiers that give it an accent - some users want the key and others want the character.

from iohub.

isolver commented on June 24, 2024

The 'intention', which may not be fully working yet, is that both are given.

The goal, I believe, is to still make the primary purpose of this type of
'scientific / experimental' key logger to create an event for every actual
physical key pressed on the keyboard. So if you have to press 3 - 3 keys to
get the 'output' character intended, then the ioHub should report all three
keys events, since that was what was pressed, but also give the information
needed to know what the end intent was by the user if the sequence of keys
preses was looked at. This is not a text editor key event API, it is a 'the
keyboard is being used as an input device for my experiment' key event API.
;)

So with that goal in mind, this is what some of the relevant fields of each
key event contain :

Scancode field: The OS / local / keyboard even, dependent scan code for
each key event. This is not going to be useful most of the time, but if
errors in key mappings are found and someone wants to try and recover the
meaningful data, this is needed.
Char code / Key ID filed: The charcode / key_id for each key press. This
is the value given by every OS after the scan code has been run through the
keyboard and local mapping tables. The value is expected to be a 1 to 1
relationship to actual characters or key descriptions for non visible keys.
It is still OS dependent though for sure. On OSX, it seems that some of the
time this code is actually the utf-8 encoded value for the key, but other
times it is not, so this can not be counted on. In Windows and Xlib it
is definitely 'not' the Unicode encoded key value.
The uchar field. Given level 2 of information, then each OS has a way to
get the Unicode utf-8 encoded value for the character, if it is a
valid Unicode character , or if it has been assigned an utf-8 encoding by
the OS vendor (Apple has done this for several dozen characters for
example, many of the 'operational' function keys on the keyboard for
example, event the apple logo.) If a given key event falls in the category
of having a utf-8 encoded value available, then that value is all that is
needed to also present the actual key graphic to the user as well, as long
as the text viewer they are using supports displaying utf-8 characters of
course, which most do. The code field has the Unicode unit version of what
you would normally see as hex in a utf-8 encoded character (I guess up to 3
or even 4 bytes can be used for 1 utf8 encoded char). If there is not a
'real' unicode value for the key pressed, or it can not be
determined, callbacks are used when possible, like OS provided lookup
tables etc. For example, on OSX, Apple reserved a block of unicode utf-8
encoding address space and have used it to define 'unicode' utf-8 encoding
values for most / all of the non-visible type keys / actions that can be
triggered on the Mac. That is what the UnicodeChars sub class of the
KeyboardConstants class is, a copy from part of the NSEvent.h file. If both
these steps fail, then it is given a 'Dead Char' assignment in this field .
Some keys will be 'dead' keys , which have no representation at this level
because they do not generate any input themselves in terms of a char /
uchar. You gave an example of such a case. Even the 'modifier' keys we are
used to can be classified as 'dead' characters in this sense (shift, alt,
ctrl, etc). These type of keys seem to always require a 'lookup' table type
approach to being able to provide a human readable / meaningful
representation of the key. This is also OS dependent. Thus why there is a
forth field of information for each key event. The 'key' field.
The key field: This is the field most people will ever need to worry
about. ;) This field holds the actual unicode char, and can be viewed as
intended. If it is a valid, visible, utf-8 encoded char we are dealing
with, then the only difference between field 3 and 4 is that field 4 has
been decoded so it is a unicode char, not a sequence of 8 bit chars, as far
as the environment is concerned, so it can be viewed properly as it should.
If the key is not a visible key or does not have a utf-8 encoded version
that the OS provided, then the key field will hold the "text label" for the
char. Common examples of this are the Page Up and Page Down keys, Escape,
arrow keys, all the common modifier keys, function keys, multimedia keys
etc. In these cases the key field holds the name / label for the key. This
is very OS specific, so I have added the ability to define a mapping file
between the 'standard' ioHub label for such keys and the label the current
OS uses. This way the idea is even most dead / non visible keys can be
given a consistent label by ioHub across OSs. This is the idea anyhow. Not
complete yet for sure.

I've learned a lot about this stuff over the last 1/2 week, and why it has
taken so long to get the initial ports done, and why the design
has evolved from one OS to the other, as until i actually got how each OS
is doing it and what functions I had access to on each, I could not make a
proper overall plan that 'should ' be able to address the OS
differences while making things as consistent as possible for people, using
this multi stages approach.

I also figured out how to actually display a unicode character properly in
a console window when a script outputs it, instead of showning the ascii
version of the utf-8 ended text, or junk, I have not perfected it in terms
of handling all the use cases that can occur, but it is possible, even when
sending the unicode text over a network or pipe and displaying it n the
console of another process. I have started to fix the bugs in ioHub in this
regard when I run into them. It seems to me that there are two main causes
for a unicode character or text not displaying correctly. A) A piece of
text that is already unicode utf-8 encoded get eencoded with utf-8 by
another part of the program somewhere, so the doubly encoded text being
garbage and can not be displayed correctly. b) Unicode Text that has been
converted to a utf-8 encoded string does not get decoded 'on the other
side' back into a unicode string, or the decoding process doesn not use
utf-8 explicitly and therefore falls back to the OS or pipes (like std.out)
default encoding type. In teh first case, you get shown the hex looking
string of text, the utf-8 encoded string rep of the unicode text. In the
second case, since a different encoding was used when decoding the utf-8
string back to unicode text, you get garbage. The same encoding scheme
(utf-8 IMO is hat we should standardize on) must be used to both encode and
decode the data for things to work right, It is not really possible for
software to automatically detemine datas encoding type in a reliable way
programatically, so the encoding type that wants to be used should
e explicitly specified. Inside the program, just always use unicode strings
/ text. Other than makig sure things are unicode strings instead of 8 bit
char strings is easy to change as the python interface supports all the
same operations on unicode text as 8 bit char strings. Then when the
program needs to either 'output' the text or 'input' text, that is raelly
only when we need to worry about ensuring a consistent encoding type is
used and that thing are not double encoded or decoded.

Anyhow, that was a bit of a dump, but it was good to get it down in
writting I think. ;) There is still lots to do and fix in this area, but
now is the time to do it and I think I finally 'get it'. Well mostly. ;) I
am able to actually sebd unicode chars from the ioHub process in an error
for example, send them over the network, and then have them displayed as
the correct unicode character graphics in a command prompt on windows or
OSX, or in a python shell. I could never get the printing to work right
before, so I think I must be going in the geneal right direction.

Let me know if you see any issues, etc.
Thanks!

On Tue, Apr 9, 2013 at 8:13 AM, Jon Peirce [email protected] wrote:

I'm not sure about /replace/ with a unicode version. The idea would be to
know what key (by name) is pressed and also which character it represents.
e.g. I could be presssing 'e' but using modifiers that give it an accent -
some users want the key and others want the character.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-16108904
.

from iohub.

peircej commented on June 24, 2024

Honestly, I haven't got time to read that amount of text properly! But
from a quick scan it sounds like you were already going for the concept
I had in my head (multiple fields so the user can extract different
forms of info). Great!

On 09/04/2013 15:31, Sol Simpson wrote:

The 'intention', which may not be fully working yet, is that both are
given.

The goal, I believe, is to still make the primary purpose of this type of
'scientific / experimental' key logger to create an event for every
actual
physical key pressed on the keyboard. So if you have to press 3 - 3
keys to
get the 'output' character intended, then the ioHub should report all
three
keys events, since that was what was pressed, but also give the
information
needed to know what the end intent was by the user if the sequence of
keys
preses was looked at. This is not a text editor key event API, it is a
'the
keyboard is being used as an input device for my experiment' key event
API.
;)

So with that goal in mind, this is what some of the relevant fields of
each
key event contain :

Scancode field: The OS / local / keyboard even, dependent scan code
for
each key event. This is not going to be useful most of the time, but if
errors in key mappings are found and someone wants to try and recover the
meaningful data, this is needed.

Char code / Key ID filed: The charcode / key_id for each key press.
This
is the value given by every OS after the scan code has been run
through the
keyboard and local mapping tables. The value is expected to be a 1 to 1
relationship to actual characters or key descriptions for non visible
keys.
It is still OS dependent though for sure. On OSX, it seems that some
of the
time this code is actually the utf-8 encoded value for the key, but other
times it is not, so this can not be counted on. In Windows and Xlib it
is definitely 'not' the Unicode encoded key value.

The uchar field. Given level 2 of information, then each OS has a
way to
get the Unicode utf-8 encoded value for the character, if it is a
valid Unicode character , or if it has been assigned an utf-8 encoding by
the OS vendor (Apple has done this for several dozen characters for
example, many of the 'operational' function keys on the keyboard for
example, event the apple logo.) If a given key event falls in the
category
of having a utf-8 encoded value available, then that value is all that is
needed to also present the actual key graphic to the user as well, as
long
as the text viewer they are using supports displaying utf-8 characters of
course, which most do. The code field has the Unicode unit version of
what
you would normally see as hex in a utf-8 encoded character (I guess up
to 3
or even 4 bytes can be used for 1 utf8 encoded char). If there is not a
'real' unicode value for the key pressed, or it can not be
determined, callbacks are used when possible, like OS provided lookup
tables etc. For example, on OSX, Apple reserved a block of unicode utf-8
encoding address space and have used it to define 'unicode' utf-8
encoding
values for most / all of the non-visible type keys / actions that can be
triggered on the Mac. That is what the UnicodeChars sub class of the
KeyboardConstants class is, a copy from part of the NSEvent.h file. If
both
these steps fail, then it is given a 'Dead Char' assignment in this
field .
Some keys will be 'dead' keys , which have no representation at this
level
because they do not generate any input themselves in terms of a char /
uchar. You gave an example of such a case. Even the 'modifier' keys we
are
used to can be classified as 'dead' characters in this sense (shift, alt,
ctrl, etc). These type of keys seem to always require a 'lookup' table
type
approach to being able to provide a human readable / meaningful
representation of the key. This is also OS dependent. Thus why there is a
forth field of information for each key event. The 'key' field.

The key field: This is the field most people will ever need to worry
about. ;) This field holds the actual unicode char, and can be viewed as
intended. If it is a valid, visible, utf-8 encoded char we are dealing
with, then the only difference between field 3 and 4 is that field 4 has
been decoded so it is a unicode char, not a sequence of 8 bit chars,
as far
as the environment is concerned, so it can be viewed properly as it
should.
If the key is not a visible key or does not have a utf-8 encoded version
that the OS provided, then the key field will hold the "text label"
for the
char. Common examples of this are the Page Up and Page Down keys, Escape,
arrow keys, all the common modifier keys, function keys, multimedia keys
etc. In these cases the key field holds the name / label for the key.
This
is very OS specific, so I have added the ability to define a mapping file
between the 'standard' ioHub label for such keys and the label the
current
OS uses. This way the idea is even most dead / non visible keys can be
given a consistent label by ioHub across OSs. This is the idea anyhow.
Not
complete yet for sure.

I've learned a lot about this stuff over the last 1/2 week, and why it
has
taken so long to get the initial ports done, and why the design
has evolved from one OS to the other, as until i actually got how each OS
is doing it and what functions I had access to on each, I could not
make a
proper overall plan that 'should ' be able to address the OS
differences while making things as consistent as possible for people,
using
this multi stages approach.

I also figured out how to actually display a unicode character
properly in
a console window when a script outputs it, instead of showning the ascii
version of the utf-8 ended text, or junk, I have not perfected it in
terms
of handling all the use cases that can occur, but it is possible, even
when
sending the unicode text over a network or pipe and displaying it n the
console of another process. I have started to fix the bugs in ioHub in
this
regard when I run into them. It seems to me that there are two main
causes
for a unicode character or text not displaying correctly. A) A piece of
text that is already unicode utf-8 encoded get eencoded with utf-8 by
another part of the program somewhere, so the doubly encoded text being
garbage and can not be displayed correctly. b) Unicode Text that has been
converted to a utf-8 encoded string does not get decoded 'on the other
side' back into a unicode string, or the decoding process doesn not use
utf-8 explicitly and therefore falls back to the OS or pipes (like
std.out)
default encoding type. In teh first case, you get shown the hex looking
string of text, the utf-8 encoded string rep of the unicode text. In the
second case, since a different encoding was used when decoding the utf-8
string back to unicode text, you get garbage. The same encoding scheme
(utf-8 IMO is hat we should standardize on) must be used to both
encode and
decode the data for things to work right, It is not really possible for
software to automatically detemine datas encoding type in a reliable way
programatically, so the encoding type that wants to be used should
e explicitly specified. Inside the program, just always use unicode
strings
/ text. Other than makig sure things are unicode strings instead of 8 bit
char strings is easy to change as the python interface supports all the
same operations on unicode text as 8 bit char strings. Then when the
program needs to either 'output' the text or 'input' text, that is raelly
only when we need to worry about ensuring a consistent encoding type is
used and that thing are not double encoded or decoded.

Anyhow, that was a bit of a dump, but it was good to get it down in
writting I think. ;) There is still lots to do and fix in this area, but
now is the time to do it and I think I finally 'get it'. Well mostly.
;) I
am able to actually sebd unicode chars from the ioHub process in an error
for example, send them over the network, and then have them displayed as
the correct unicode character graphics in a command prompt on windows or
OSX, or in a python shell. I could never get the printing to work right
before, so I think I must be going in the geneal right direction.

Let me know if you see any issues, etc.
Thanks!

On Tue, Apr 9, 2013 at 8:13 AM, Jon Peirce [email protected]
wrote:

I'm not sure about /replace/ with a unicode version. The idea would
be to
know what key (by name) is pressed and also which character it
represents.
e.g. I could be presssing 'e' but using modifiers that give it an
accent -
some users want the key and others want the character.

—
Reply to this email directly or view it on
GitHubhttps://github.com//issues/8#issuecomment-16108904
.

—
Reply to this email directly or view it on GitHub
#8 (comment).

Jonathan Peirce
Nottingham Visual Neuroscience

http://www.peirce.org.uk

from iohub.

isolver commented on June 24, 2024

I also just realized you were pickin up n the bug in github. I need to
check that actually. That for the very elevent reminder. ;)

I had read that pyHook as not unicode compliment because it used the A
versions of the Windows functions. However I have done tests entering a few
french letters since my keyboard supports that and they seemed to be
storing and retrievable decodable back to the right letter. I will try
some chinese ones using the screen keyboard and see with that later.

On Tue, Apr 9, 2013 at 8:13 AM, Jon Peirce [email protected] wrote:

I'm not sure about /replace/ with a unicode version. The idea would be to
know what key (by name) is pressed and also which character it represents.
e.g. I could be presssing 'e' but using modifiers that give it an accent -
some users want the key and others want the character.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-16108904
.

from iohub.

isolver commented on June 24, 2024

I have looked at the pyhook code and I do not think there is an issue. pyHook does not 'give' you unicode characters, but using the information is does give, it should be possible to get them ourselves. The keycode and scancode should be valid and can be used in a call to the ToUnicode function. The 'twist' is that since we are getting the events from a LL hook in a separate process, we can not use the GetKeyboardState() function to get the keyboard state that is also passed into the ToUnicode call. Each thread has a different keyboard state array in windows, and there is no 'standard way' for one process to get another processes keyboard state array. SO the plan is to create and maintain a keyboard state array that is compatible with what needs to be handed to the ToUnicode function.

from iohub.

peircej commented on June 24, 2024

Unicode can be addressed later if needed. It is something that people
have requested before, but it isnt currently provided, so it isn't like
anybody would be losing anything.

Jon

On 14/04/2013 17:05, Sol Simpson wrote:

I have looked at the pyhook code and I do not think there is an issue.
pyHook does not 'give' you unicode characters, but using the
information is does give, it should be possible to get them ourselves.
The keycode and scancode should be valid and can be used in a call to
the ToUnicode function. The 'twist' is that since we are getting the
events from a LL hook in a separate process, we can not use the
GetKeyboardState() function to get the keyboard state that is also
passed into the ToUnicode call. Each thread has a different keyboard
state array in windows, and there is no 'standard way' for one process
to get another processes keyboard state array. SO the plan is to
create and maintain a keyboard state array that is compatible with
what needs to be handed to the ToUnicode function.

—
Reply to this email directly or view it on GitHub
#8 (comment).

Jonathan Peirce
Nottingham Visual Neuroscience

http://www.peirce.org.uk

from iohub.

ioHub key logging module is not unicode compliant on Windows about iohub HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent