Deion When building PTInteraction benchmarks for project mo

to be pedantic - where does the casting occur: elara.input rea

Hey, a response to your points: Yes, this is my understanding

awesome explanation. I have trouble reproducing, eg: <div class=

long stop id causes casting error about elara HOT 4 CLOSED

arup-group commented on June 1, 2024

long stop id causes casting error

from elara.

Comments (4)

fredshone commented on June 1, 2024

to be pedantic - where does the casting occur:

elara.input reads the schedule input from xml, so the indexes in elara at this point will be string I'm pretty sure (NOT CHECKED THOUGH).
I like string ids.
elara.event_handlers does it's thing and outputs a big csv of required counts. But I still expect the output indices to be string type (NOT CHECKED). Therefore i do not expect pandas to have done any scientic notation. Therefore I expect the csv indices to be strings.
elara.benchmark read in the bm data as a dict from json. So they pressumably get nice integers if we want or strings if we prefer - I gather you are using strings.
However, elara.benchmark pd.read_csvs in the csv of sim results (from elara.event_handlers) and somehow gets something that isn't string???
One of my above steps must be wrong.

For your solution, you can more simply specify the datatype for individual columns using `pd.read_csv(path, dtype={"id":str}) I think. Based on my logic above i am happy for you to do this as I am expecting a str regardless. But my logic is wrong somewhere so who knows.

Ultimately happy for you to make changes if the tests still pass.

If we were to be very careful we could add in a test that represents this problem. But we would have to build new test data and so on. So maybe not unless you are twiddling your thumbs.

I will also talk to Kasia about avoiding massive integers and being consistent with type, ideally strings.

from elara.

Georgea75 commented on June 1, 2024

Hey, a response to your points:

Yes, this is my understanding as well.
Yes, at this point the CSV has the correct IDs.
Yes, this is correct and the Ids are valid at read from the benchmark.
Correct, this is where the error occurs:
results_df = pd.read_csv(path, index_col=0)
results_df = results_df.groupby(results_df.index).sum()
results_df = results_df[[str(h) for h in range(24)]]
results_df.index = results_df.index.map(str)
results_df.index.name = 'stop_id'

What is happening line by line
results_df = pd.read_csv(path, index_col=0)
loads the index as a numpy.float64. At this point python displays the float as 1.211729924256132e+19. The type is inferred as float as the data in the csv takes the form 12117299242561318912. Maybe "12117299242561318912" would fix this?

Next line of interest is
results_df.index = results_df.index.map(str)
this converts the index which is numpy.float64 to str , which generates "1.211729924256132e+19"

Therefore, it finds no matches as "1.211729924256132e+19" != "12117299242561318912"

I have changed my solution to use dtype={0:str} as you suggested :) Very happy to write a test as well. But would we have to make a test to cover this issue for each benchmark?

Action
Do you want me to make a branch change all cases of read_csv from the csv dump to dtype={0:str} as this issue may occur for every benchmark type? Or should I just make the change for PTInteraction? Or can I just make the changes to the current new-zealand-branch (At this point this branch would cover several things, adding each of the four new nz-benchmarks as well as this load change)

from elara.

fredshone commented on June 1, 2024

awesome explanation.

I have trouble reproducing, eg:

In [96]: df = pd.DataFrame(["99999999999999999999999999999999999"]*5, columns=["a"])

In [97]: df
Out[97]:
                                     a
0  99999999999999999999999999999999999
1  99999999999999999999999999999999999
2  99999999999999999999999999999999999
3  99999999999999999999999999999999999
4  99999999999999999999999999999999999

In [98]: df.a
Out[98]:
0    99999999999999999999999999999999999
1    99999999999999999999999999999999999
2    99999999999999999999999999999999999
3    99999999999999999999999999999999999
4    99999999999999999999999999999999999
Name: a, dtype: object

In [99]: df.to_csv(path)

In [100]: df = pd.read_csv(path)

In [101]: df.a
Out[101]:
0    99999999999999999999999999999999999
1    99999999999999999999999999999999999
2    99999999999999999999999999999999999
3    99999999999999999999999999999999999
4    99999999999999999999999999999999999
Name: a, dtype: object

But i trust you and i like strings so please:

force string index - please do this for every bm - i think it's a good test
don't worry about new tests
happy for you to include on your NZ branch

from elara.

Georgea75 commented on June 1, 2024

Excellent, I will make the changes today

from elara.

long stop id causes casting error about elara HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent