AFAIU, repeatedly parsing integers is one of the parts of Microsoft 6502 that saved co

Looking at the BASIC benchmarks numbers: <a href="https://en.wikipedia.org/wiki/Rugg/F

Basic Optimization Request: a token for pre-parsed small integers,about commanderx16/x16-rom

Comments (29)

tomxp411 commented on September 15, 2024 3

Yes, because it runs at 2Mhz compared to the roughly 1Mhz that the C64 runs at. If you compare it with the C128 in fast mode (2Mhz as well), then the advantage shrinks down to maybe 10%, which is basically (no pun intended) the amount that Basic V7 is slower than Basic V2.

I've actually tested this. A PAL C64 runs my Prime Number generator in 32 seconds and 31 seconds in NTSC. A VIC-20 runs it in 26, and a 2MHz BBC Micro runs it in 16.4 seconds. So the BBC is actually slightly slower per-clock than the Commodore 64.

So this is correct, while BBC BASIC has some nice features, it's not necessarily faster than MS BASIC, and we shouldn't lean on it as any sort of performance metric.

from x16-rom.

mist64 commented on September 15, 2024 1

Yes, converting int to float is very cheap, but taking a float from the instruction stream is free. (Compare this to converting ASCII to float, which is super expensive.) I'd prefer floats in the instruction stream, because they are fastest and work for all numeric literals. Ints in the instruction stream work for only a subset of cases are are slower.

from x16-rom.

paulscottrobson commented on September 15, 2024 1

If you know you are normalising a byte, that reduce the normalise overhead (the shifting) quite a lot, because you can zero and forget the rest of the mantissa.

from x16-rom.

tomxp411 commented on September 15, 2024 1

The other problem is formatting. If the programmer entered a number in hex, it should displayed in hex later. And some floating point numbers don't have exact decimal matches. That all has to be addressed somehow, too.

from x16-rom.

BruceMcF commented on September 15, 2024 1

tomxp411:

The other problem is formatting. If the programmer entered a number in hex, it should displayed in hex later.

Yes, that is the reason that in the original proposal the parsed tiny integer does not replace the text literal but rather prefixes it and the routine would skip pass it. This is a speed versus codesize trade-off.

Now, FOR integers (bytes, 16bit, or hypothetical 32bit), all that is needed to recreate the integer is the base, width, and whether it is signed or unsigned. And unlike Forth, so far the bases supported are 2, 10 and 16. So two bits for base, two bits for width, one bit for signed ... five bits would encode everything needed to recreate a pre-parsed integer. And some of those combinations might not be needed ... if people want to try to enter hexadecimal values as negatives, I'd be OK with the parser throwing that as an error, or else LIST just giving it back to them as the actual hexadecimal value.

And some floating point numbers don't have exact decimal matches. That all has to be addressed somehow, too.

In the original proposal, only exact tiny integers were pre-parsed, and they all have exact decimal matches.

Which links up with kktos:

If I got it right, here, we need either

a token for a number AND a way to tell its type, i.e small int, int or float
type float could be handled immediately where integer types would need some important
work on basic (adds %)

or 3 tokens, one for each types.

If we there is a token, a descriptor and data, and to save space the literal string is to be recovered from the value, then the high bit (bit 7) would select between int and float. For int, there's bytes and ints in signed decimal, unsigned decimal, unsigned binary or unsigned hex. That's 3 bits. Reserve long in signed decimal, unsigned decimal, unsigned binary and unsigned hex and that's 4 bits with 4 values unassigned, but the descriptor might be an index into two tables, one holding the size and one holding the base, and if the index is 0, the output is signed.

Floating point might be whether it was entered in decimal or exponential notation, the width of the value (or mantissa part), the number of decimal places. As I write this, I am doubting that both of those fit into 7bits, but they would surely fit into 15, so floating point might need a second descriptor byte to reproduce the original value. It's also a long time since I've thought about recovering the original floating point input from the original value, there might be a rounding trim input in addition to width and decimal place ... but I still think it would all fit into 15 bits with room to spare.

[Edit] I might add, if the original floating point entry is too problematic to recreate correctly from the stored floating point value, one could skip floating point entirely and avoid reparsing literal floats in tight inner loops in the traditional way by setting them equal to a variable and using the variable in the inner loop. Then it would just be $FE $FF [descriptor] [data] ... bottom two bits in the descriptor 00=signed decimal, 01=unsigned decimal, 10=unsigned binary, 11=unsigned hex, next two bits 00 for byte, 01 for int, 10 and 11 reserved for 32bit integers.

from x16-rom.

Kroc commented on September 15, 2024 1

Looking at the BASIC benchmarks numbers: https://en.wikipedia.org/wiki/Rugg/Feldman_benchmarks#Sample_results even doubling the BBC BASIC results to effectively slow it down to 1 MHz, it still comes out on top, but only just.

The thing is, Commodore never cared about the performance of their BASIC, whereas Sophie Wilson continually improved BBC BASIC throughout its life, even after moving to ARM, and was obsessive over its performance. It definitely is a high-quality BASIC that's still revered [here in the UK] by University professors and scientists who used it for everything in the 80s.

from x16-rom.

mist64 commented on September 15, 2024

I like this idea!

But if this encodes integer bytes, the interpreter will still have to convert them to floating point, since all variables are floating point internally. So instead, literals could be 5 float bytes in standard FAC encoding, following the prefix byte.

from x16-rom.

Kroc commented on September 15, 2024

BASIC does support integers (A%), no? Are these converted into floating point internally, because they've always been faster than the default floats. Perhaps you could piggy-back on the existing BASIC integers?

from x16-rom.

mist64 commented on September 15, 2024

BASIC always uses floating point arithmetic and memory representation. Integer variables are just truncated on every store.

BASIC could be extended to support integers natively, which should also have great speed benefits, but this is n independent (huge) task.

from x16-rom.

tomxp411 commented on September 15, 2024

BASIC does support integers (A%), no? Are these converted into floating point internally, because they've always been faster than the default floats.

My benchmarks showed the opposite, that doing math with integer variables was actually slower than with floats.

After a test on the CX16 emulator:
A=A+B does 1667 operations per second with integers and 1875 with floats.

Here is the test program.

5 TI$="000000"
6 N=5000
10 A%=1:B%=1
20 FOR I=1 TO N
30 A%=A%+B%
40 NEXT
50 PRINT "INT TEST";TI,N/TI*60
90 TI$="000000"
100 A=1:B=1
110 FOR I=1 TO N
120 A=A+B
130 NEXT
140 PRINT "FLOAT TEST";TI,N/TI*60

from x16-rom.

BruceMcF commented on September 15, 2024

I like this idea!

But if this encodes integer bytes, the interpreter will still have to convert them to floating point, since all variables are floating point internally. So instead, literals could be 5 float bytes in standard FAC encoding, following the prefix byte.

This optimization is orthogonal to optimizing integer processing ... converting an integer value to floating point is faster than converting an integer literal constant to floating point.

But if an integer operations parsing loop was added to the current parsing loop, yes, that would pile up the savings from this optimization.

from x16-rom.

BruceMcF commented on September 15, 2024

BASIC does support integers (A%), no? Are these converted into floating point internally, because they've always been faster than the default floats. Perhaps you could piggy-back on the existing BASIC integers?

Integers in CBM Basic have always been slower ... there are Basics that do normal type promotion and compute all integer operations using integer routines, but to save space CBM V2 doesn't include that (also Bill Gates wanted a per-unit license fee for the Basic that did that, and Tramiel didn't want to pay a per-unit license fee, he wanted a one and done license fee ... he said he had already gotten married once, he didn't plan on getting married a second time).

from x16-rom.

tomxp411 commented on September 15, 2024

I think that pre-parsing all numeric literals would be of significant benefit. For the actual token code, I believe bytes 2-31 are mostly unused. So use something like 2 as a prefix byte, followed by 5 bytes for the binary number.

Ideally, certain constructs would be optimized for integer values, but that is perhaps a "2.0" problem.

from x16-rom.

paulscottrobson commented on September 15, 2024

Efficiency depends on how BASIC works. Converting a byte integer to FAC is very cheap, you just have to get the mantissa right and strip out the sign (I can't remember the exact format OTOMH). BruceMcF is correct about common integers, I remember reading of a paper not too long ago where someone had actually studied this and there were a ridiculously low number. Probably includes 0,-1 and common powers of 10 and 16, and 42.

from x16-rom.

BruceMcF commented on September 15, 2024

Though a single byte is more compact than a float ... and normalizing a signed byte is faster than normalizing a two byte integer, since it can be done by shifting the accumulator and decrementing a register ... three bytes of the mantissa are always zero. And it gains the full benefit of a stored integer if an integer parsing loop is added later.

$FE $FE as a parsed float for literals that are not integers in the range of -128 to +127 would a fine addition to the proposal.

from x16-rom.

kktos commented on September 15, 2024

If I got it right, here, we need either

a token for a number AND a way to tell its type, i.e small int, int or float
type float could be handled immediately where integer types would need some important work on basic (adds %)
or 3 tokens, one for each types.

the first seems more appealing as it's intrinsically open to enhancements.

from x16-rom.

BruceMcF commented on September 15, 2024

Part of the point of the token for pre-parsed small integers was that they were small and simple numbers. So the answer for %1000 and $C8 would have been "those get parsed".

On further thinking, I believe that tomxp411 is possibly right about not really being able to recover the string entered for a floating point literal in all cases, even if giving a width and decimal place and rounding trim. In any event, I would want to do much more experimentation or research or both than I have time for in order to see.

However, I said that this would be orthogonal to the issue of an "outer" integer parsing loop, and if these integers are to be normalized into floating point values, my proposal has one flaw. The sign is distinct from the mantissa in a floating point value (IIUC, in the FAC and ARG it is stored in the high bit of the byte past the end of the mantissa, and only in the memory storage format does it replace the leading 1 which is always bit 32 of a properly normalized mantissa). So in this proposal, there should be a sign bit in the descriptor. It may as well be bit 7 if there is no certainty of a need to distinguish float and integer descriptors:

$FE $FF [descriptor] [data0] ...

[descriptor] = %s.......

Data width may as well be the low two bits ... even if it is not intended to generate 24bit integers, those two bits index to how far from the first data byte to the last one ...

[descriptor] = %s.....ww

Two bits serve for base. If the high bit is zero, the base is 10, if the high bit is 1, the base is either 2 or 16, depending on the low bit (this is only used in printing, so mingling width and output types :

[descriptor] = %s...bbww

Finally, as a space saving measure, there are two "format" bits to specify if there was a space before or after the number, either of which is removed by parsing:

[descriptor] = %s.ffbbww

... and I would specify bit 7 as the "type" field for "float/integer" reserving the future possibility that a floating point descriptor system might be sorted out sometime in the future. That makes the format:

$FE $FF [descriptor] [data0] {[...] [data+ww]}
[descriptor] = %stffbbww
s = sign of the integer (data stored the absolute value as an unsigned int
t = 0 for integer
ff = 1x for a leading space, 0x for none, x1 for a trailing space, x0 for none.
bb = base ID, 00=decimal, 10=binary, 11=hexadecimal
ww = width, 00=byte, 01=16bit, 11=32bit

from x16-rom.

kktos commented on September 15, 2024

About the float as string, you can enter a float in many format but when the system displays it, it will use a default representation. Is that so important that we need to have the original entered version in the LIST ? I'm not sure.
But anyway, if that's important, then as previously suggested, we need to store the ascii and the pre-parsed number. The ascii only for the LIST and the pre-parsed for execution.
K.I.S.S.

from x16-rom.

BruceMcF commented on September 15, 2024

About the float as string, you can enter a float in many format but when the system displays it, it will use a default representation. Is that so important that we need to have the original entered version in the LIST ? I'm not sure.

For % and $, it kind of defeats the purpose of having binary and hexadecimal representation if you throw it away when you list it. And it is a bit silly to omit them if there are ample bits to represent the base once you have the bit to represent the sign and the bits to represent the width.

I'm not a big fan of adding eight bytes to every floating point literal ... if the choice is between that and using a floating point variable inside a loop to speed up floating point constants, I'd prefer the latter.

But anyway, if that's important, then as previously suggested, we need to store the ascii and the pre-parsed number. The ascii only for the LIST and the pre-parsed for execution.
K.I.S.S.

It's only "necessary" for floats. Integers can always be recovered exactly as entered. So there's no need to bloat the program to that extent.

However, that is neither here nor there for this proposal: it leaves prospective pre-parsed floating point literals as reserved but undefined other than descriptor bit6=1.

from x16-rom.

tomxp411 commented on September 15, 2024

The more I think about it, the more I think this might be an “advanced” feature used by the editor.

Ie: when entering programs with line numbers, numbers don’t get tokenized. But when using an external tokenizer (and eventually the advanced editor), number tokens can be used to pre-parse literals.

from x16-rom.

BruceMcF commented on September 15, 2024

The token would still need to be executed by the ROM Basic. And recognizing integers in the parsing step is straightforward if hex and binary are prefixed... prefix or string of numbers without decimal ... so bailing and letting the original execution loop handle if it's not an integer is a live option ...

But KISS would seem to be pre-parse all numbers, so that the number parsing moves entirely to program parsing and is not needed in the execution of the Basic program at all.

{ALTHOUGH even then, maybe dropping the ability from the main interpretation loop wouldn't be done if it is desired to run C64 Basic saved programs as-is in the CX16.}

Note that in remarks about the potential floating point descriptor, I had too much AWK / C printf on the brain ... only the width of the value as entered is needed, as the exponent and the selection between common decimal fractions and scientific notation together determine where the decimal point goes, and the fact it was not parsed as an integer implies that there SHOULD BE a decimal point there, somewhere. Then the fact that the width of the value to be printed as a floating point, including decimal point and possible sign, may be eight characters or more dictates a 5 bit width. From there, the specification of the descriptor follows.

It would still be two distinct "pre-parse integer" and "pre-parse floating point" routines, as the floating point descriptor doesn't have room for swallowing leading and/or trailing space. And sign demotes to bit6, since the first decision to be made is whether the pre-parsed number is stored as an integer or a floating point.

$FE $FF [descriptor] [data0] ...
[descriptor.7]=0 => integer descriptor
[descriptor.7]=1 => floating point descriptor

Integer descriptor = %tsffbbdd
t = 0 for integer
s = sign of the integer (data stored the absolute value as an unsigned int
ff = 1x for a leading space, 0x for none, x1 for a trailing space, x0 for none.
bb = base ID, 00=decimal, 10=binary, 11=hexadecimal
dd = data width, 00=byte, 01=16bit, 11=32bit

Floating point descriptor = %tsewwwww
t = 1 for floating point
s = sign of the mantissa
e = 0 for common decimal, 1 for exponential notation
wwwww = width of the number display as entered*

*NB: On occasion, a floating point may not round to the same final value in the display field ... this is just how floating points are: a FAQ may reassure users that this is the closest approximation to the binary value which is the closest approximation to the number they entered.
Note also, if they entered "." to get the fastest parsing of 0.0 ... that would still print out as "." because there's no other way to enter a single character wide literal that is not a small integer.

from x16-rom.

nonarkitten commented on September 15, 2024

Ah jeeze, keep it simple.

For internal structures we can use 32-bit NAN boxing for the 16-bit integers. This means we can store integers WITHIN floating point numbers basically for free. Adding the exception check and handful of math routines would not add a lot to the whole ROM. NAN Boxing gives us 128 "types" of 16-bit values. So we can have string pointers, structures, bla bla bla, all within the same 32-bit value.

For tokenization, I already did this for an interpreter I use internally at work -- although I handle decimals (0-9), bytes (-128 to 127), words and longs as well as floats, all separately. This ensures that not only am I using the least amount of space possible, but also using LESS SPACE than the original ASCII form took -- and that is in the SPIRIT of BASIC tokenization (I also strip spaces).

And it is REALLY FAST doing it like this. It's a little more complex handling two types embedded into variables and it's a little more complex handling type coercion, but avoiding re-parsing during execution is critical to performance. But now EVERYTHING is a token, every token is one byte so everything is just two 256-byte tables and a jump. My 6502 is pretty rusty, but something like:

NEXT:
  LDX (opcode_ptr),Y
  INY
  BCC .L1
  INC opcode_ptr
.L1:
  LDA table_high,X
  PSA
  LDA table_low,X
  PSA
  RTS

This bit of code live in the zero page and yes, it's not only using the SWEET16 RTS trick but also self-modifying code for the IP counter. We also only need to increment the high 8-bit of the pointer when Y rolls over although we "waste" a byte of zero page holding the zero.

from x16-rom.

BruceMcF commented on September 15, 2024

Ah jeeze, keep it simple.

For internal structures we can use 32-bit NAN boxing for the 16-bit integers. This means we can store integers WITHIN floating point numbers basically for free.

If your definition of "for free" is "3 additional bytes per 16bit integer is free", we have different definitions of free.

Adding the exception check and handful of math routines would not add a lot to the whole ROM. NAN Boxing gives us 128 "types" of 16-bit values. So we can have string pointers, structures, bla bla bla, all within the same 32-bit value.

Remember that in the C64 floating point you are talking about, exponent $FF is just another exponent and there are no subnormal values ... all values with a 0 in the biased exponent are zero. There doesn't seem to be a NaN to box.

For tokenization, I already did this for an interpreter I use internally at work -- although I handle decimals (0-9), bytes (-128 to 127), words and longs as well as floats, all separately. This ensures that not only am I using the least amount of space possible, but also using LESS SPACE than the original ASCII form took -- and that is in the SPIRIT of BASIC tokenization (I also strip spaces).

The tokens already allocated to CBM V3.5 and CBM v7 tokens are reserved. What is free are the extension CBM v7 token pairs, $FE xx, that haven't already been allocated.

And it is REALLY FAST doing it like this.

Sure it is. Throwing out compatibility with parsed CBM v2 Basic and starting from scratch would be an interesting exercise, but it seems as if it is not the interesting exercise at hand. The current project has a different form of interesting exercise.

It's a little more complex handling two types embedded into variables and it's a little more complex handling type coercion, but avoiding re-parsing during execution is critical to performance. But now EVERYTHING is a token, every token is one byte so everything is just two 256-byte tables and a jump. ...

So the door to that is already closed ... at least for ROM Basic. There will be multiple User banks available in the ROM, so if someone makes a "Basic from Scratch" available which people find appealing, it has a chance of getting some level of adoption.

from x16-rom.

nonarkitten commented on September 15, 2024

Ah jeeze, keep it simple.

For internal structures we can use 32-bit NAN boxing for the 16-bit integers. This means we can store integers WITHIN floating point numbers basically for free.

If your definition of "for free" is "3 additional bytes per 16bit integer is free", we have different definitions of free.
Two. Use proper floats. And arrays of chars and words would not have floats per element, this is just at the var-level.

Adding the exception check and handful of math routines would not add a lot to the whole ROM. NAN Boxing gives us 128 "types" of 16-bit values. So we can have string pointers, structures, bla bla bla, all within the same 32-bit value.

Remember that in the C64 floating point you are talking about, exponent $FF is just another exponent and there are no subnormal values ... all values with a 0 in the biased exponent are zero. There doesn't seem to be a NaN to box.
Then fix it :)

For tokenization, I already did this for an interpreter I use internally at work -- although I handle decimals (0-9), bytes (-128 to 127), words and longs as well as floats, all separately. This ensures that not only am I using the least amount of space possible, but also using LESS SPACE than the original ASCII form took -- and that is in the SPIRIT of BASIC tokenization (I also strip spaces).

The tokens already allocated to CBM V3.5 and CBM v7 tokens are reserved. What is free are the extension CBM v7 token pairs, $FE xx, that haven't already been allocated.
That's fine -- you only need one.

And it is REALLY FAST doing it like this.

Sure it is. Throwing out compatibility with parsed CBM v2 Basic and starting from scratch would be an interesting exercise, but it seems as if it is not the interesting exercise at hand. The current project has a different form of interesting exercise.
I wouldn't say incompatible.

It's a little more complex handling two types embedded into variables and it's a little more complex handling type coercion, but avoiding re-parsing during execution is critical to performance. But now EVERYTHING is a token, every token is one byte so everything is just two 256-byte tables and a jump. ...

So the door to that is already closed ... at least for ROM Basic. There will be multiple User banks available in the ROM, so if someone makes a "Basic from Scratch" available which people find appealing, it has a chance of getting some level of adoption.
Your lack of imagination is boring. Fine, unsubbed.

from x16-rom.

BruceMcF commented on September 15, 2024

It's not a question of imagination, it's a question of what the design team is trying to accomplish. If you can imagine a better speed optimization for CBM V2 Basic literal parsing, go for it. I'll happily jump on that bandwagon.

But imagining a faster Basic by starting from scratch is an already solved problem, even in the eighties... BBC is roughly twice as fast as CBM v2.

from x16-rom.

EgonOlsen71 commented on September 15, 2024

BBC is roughly twice as fast as CBM v2

Yes, because it runs at 2Mhz compared to the roughly 1Mhz that the C64 runs at. If you compare it with the C128 in fast mode (2Mhz as well), then the advantage shrinks down to maybe 10%, which is basically (no pun intended) the amount that Basic V7 is slower than Basic V2.

from x16-rom.

BruceMcF commented on September 15, 2024

Yes, because it runs at 2Mhz compared to the roughly 1Mhz that the C64 runs at. If you compare it with the C128 in fast mode (2Mhz as well), then the advantage shrinks down to maybe 10%, which is basically (no pun intended) the amount that Basic V7 is slower than Basic V2.

I've actually tested this. A PAL C64 runs my Prime Number generator in 32 seconds and 31 seconds in NTSC. A VIC-20 runs it in 26, and a 2MHz BBC Micro runs it in 16.4 seconds. So the BBC is actually slightly slower per-clock than the Commodore 64.

So this is correct, while BBC BASIC has some nice features, it's not necessarily faster than MS BASIC, and we shouldn't lean on it as any sort of performance metric.

I've never had one, but it is interesting to know that the BBC devotees are mistaken about the clock efficiency of that version of Basic. I will have to switch to some other more efficient Basic of the era when I have time to do a perusal of benchmark results.

from x16-rom.

paulscottrobson commented on September 15, 2024

It's more it's structure and long variable names. The reason it always seemed quicker I suspect is that adding assembler routines is very easy compared to almost any other micro of that era.

from x16-rom.

BruceMcF commented on September 15, 2024

It's more it's structure and long variable names. The reason it always seemed quicker I suspect is that adding assembler routines is very easy compared to almost any other micro of that era.

Yes ... benchmarks are on thing, but the easier it is to implement a tight inner loop as an assembly language routine, the easier it is to speed up the "20% of the code that takes 80% of the time".

from x16-rom.

Basic Optimization Request: a token for pre-parsed small integers about x16-rom HOT 29 OPEN

Comments (29)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent