Comments (4)
to be pedantic - where does the casting occur:
- elara.input reads the schedule input from xml, so the indexes in elara at this point will be string I'm pretty sure (NOT CHECKED THOUGH).
- I like string ids.
- elara.event_handlers does it's thing and outputs a big csv of required counts. But I still expect the output indices to be string type (NOT CHECKED). Therefore i do not expect pandas to have done any scientic notation. Therefore I expect the csv indices to be strings.
- elara.benchmark read in the bm data as a dict from json. So they pressumably get nice integers if we want or strings if we prefer - I gather you are using strings.
- However, elara.benchmark pd.read_csvs in the csv of sim results (from elara.event_handlers) and somehow gets something that isn't string???
- One of my above steps must be wrong.
For your solution, you can more simply specify the datatype for individual columns using `pd.read_csv(path, dtype={"id":str}) I think. Based on my logic above i am happy for you to do this as I am expecting a str regardless. But my logic is wrong somewhere so who knows.
Ultimately happy for you to make changes if the tests still pass.
If we were to be very careful we could add in a test that represents this problem. But we would have to build new test data and so on. So maybe not unless you are twiddling your thumbs.
I will also talk to Kasia about avoiding massive integers and being consistent with type, ideally strings.
from elara.
Hey, a response to your points:
- Yes, this is my understanding as well.
- Yes, at this point the CSV has the correct IDs.
- Yes, this is correct and the Ids are valid at read from the benchmark.
- Correct, this is where the error occurs:
results_df = pd.read_csv(path, index_col=0)
results_df = results_df.groupby(results_df.index).sum()
results_df = results_df[[str(h) for h in range(24)]]
results_df.index = results_df.index.map(str)
results_df.index.name = 'stop_id'
What is happening line by line
results_df = pd.read_csv(path, index_col=0)
loads the index as a numpy.float64. At this point python displays the float as 1.211729924256132e+19. The type is inferred as float as the data in the csv takes the form 12117299242561318912. Maybe "12117299242561318912" would fix this?
Next line of interest is
results_df.index = results_df.index.map(str)
this converts the index which is numpy.float64 to str , which generates "1.211729924256132e+19"
Therefore, it finds no matches as "1.211729924256132e+19" != "12117299242561318912"
I have changed my solution to use dtype={0:str} as you suggested :) Very happy to write a test as well. But would we have to make a test to cover this issue for each benchmark?
Action
Do you want me to make a branch change all cases of read_csv from the csv dump to dtype={0:str} as this issue may occur for every benchmark type? Or should I just make the change for PTInteraction? Or can I just make the changes to the current new-zealand-branch (At this point this branch would cover several things, adding each of the four new nz-benchmarks as well as this load change)
from elara.
awesome explanation.
I have trouble reproducing, eg:
In [96]: df = pd.DataFrame(["99999999999999999999999999999999999"]*5, columns=["a"])
In [97]: df
Out[97]:
a
0 99999999999999999999999999999999999
1 99999999999999999999999999999999999
2 99999999999999999999999999999999999
3 99999999999999999999999999999999999
4 99999999999999999999999999999999999
In [98]: df.a
Out[98]:
0 99999999999999999999999999999999999
1 99999999999999999999999999999999999
2 99999999999999999999999999999999999
3 99999999999999999999999999999999999
4 99999999999999999999999999999999999
Name: a, dtype: object
In [99]: df.to_csv(path)
In [100]: df = pd.read_csv(path)
In [101]: df.a
Out[101]:
0 99999999999999999999999999999999999
1 99999999999999999999999999999999999
2 99999999999999999999999999999999999
3 99999999999999999999999999999999999
4 99999999999999999999999999999999999
Name: a, dtype: object
But i trust you and i like strings so please:
- force string index - please do this for every bm - i think it's a good test
- don't worry about new tests
- happy for you to include on your NZ branch
from elara.
Excellent, I will make the changes today
from elara.
Related Issues (20)
- trip breakdown comparisons could provide better output
- PCE counts handler HOT 1
- `ValueError: max() arg is an empty sequence` error when running on SF eqasim-org/california model HOT 1
- Wrong averaging of speeds HOT 1
- Link Vehicle Capacity Handler HOT 1
- capacity definition structure change in transit_vehicles.xml HOT 1
- Code Review in Preparation for FOSS HOT 4
- TripLogs output appears to exclude some trips HOT 3
- Dependency conflict mizani/plotnine
- User warning about sequence matcher HOT 3
- Support for non-PT sims
- Expose Elara's version via the CLI
- Support for other routed modes HOT 1
- simplify leg/trip/plan log handlers
- Standardise the unique IDs in various elara handler outputs HOT 1
- remove plotnine dependancy
- Support Python 3.11
- Upgrade Node 12-based actions in the CI Build
- suppres
- plan_logs ignores PT modes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elara.