Comments (21)
@adrian-carro Looks like the random number generator instance rand
is declared as a public variable which means it is shared with all of the code.
Probably makes the most sense to create a single instance of the random number generator and then pass that instance into the constructor of any object that needs a source of randomness.
from housing-model.
Also watch out for any operations where you are iterating over HashSet
or HashMap
data structures as the order in which the iteration is done will not be consistent between model runs.
from housing-model.
Is there a test case? What is the best way to generate an output file that we can quickly check is the same after two model runs.
from housing-model.
With PR #52, any problem with random number generation would only affect branch ML-calibration. I'm thus reducing priority to low, for now, though I'll check this issue towards the end of this week.
from housing-model.
@davidrpugh I am re-stating high priority for this issue as I've just had time to check the random numbers in the main branch and I get again inconsistencies between different runs. For example, if one prints a prng.nextInt()
after 5 times steps, different runs will lead to a different number being printed.
from housing-model.
Just so that I can reproduce this issue myself. Are you saying the if you insert a System.out.print(prng.nextInt())
in the main
method of the Model.java
class, then after t=5 time steps, different runs will produce different outputs?
Which output file should I look at the see the differences?
from housing-model.
So, if you place a System.out.print(prng.nextInt()) within the main method in the Model class and you take note, for example, of the 5th number printed, when you compare it between different runs, you'll see different numbers. That proves something is going wrong with the random number generation. Apart from this, of course, you can compare the output files, which are also different after the first several time steps and in several columns.
from housing-model.
Ok. The only way that can happen is if the generator is getting called a different number of times depending on model run. I have a few ideas about why this is happening but they will be difficult to diagnose give the currrentbstate if the code.
Can you please point me to the single most important example of a model output file that is different across model runs using the same seed.
from housing-model.
Ok, let's take the Output-run1.csv
file, row 6, corresponding to Model time = 4
(first column). For two different runs I find different values at the second column, nNonBTLHomeless
. This is just an easy to find example, representative of what is happening. I am guessing any example is fine, as we just need to detect where the problem is stemming from.
from housing-model.
Is the insertion of the print statement necessary to generate the bug? I spot checked the Output-1.csv file when I made my previous changes and the output appeared identical.
Is there any multi-threading in the model?
from housing-model.
There is no multi-threading, and no, the print statement is not needed, it was just to get information faster, without any need to compare output files. Could you confirm is the Output-1.csv files are identical for you? That would be fairly surprising... why would there be system dependencies?
from housing-model.
I will try to replicate the issue when I get home. I am running on Windows using some version of oraclejdk 8.
from housing-model.
Great, thanks for doing this! I'm on Ubuntu 16.04, and running openjdk version "1.8.0_162".
from housing-model.
@adrian-carro I just ran the main
method of the Model.java
class for roughly 500 time steps. This generates a directory of output in the Results
folder. I then, re-run the main
method again (without changing anything) to generate a second directory of output in the Results
folder. I then compare the resulting Output-run1.csv
files. They appear to be identical. For example here is the record for t=103
from both files:
First simulation I ran generated...
103, 283, 3, 286, 331, 617, 646, 31, 677, 255, 289, 2, 0, 1549, 1270, 12, 0, 7, 0.26614173228346455, 1.0244697019962505, -0.002299730036721126, 320082.397005936, 250030.7937810174, 252088.47594443636, 0.0, 21.326370650946313, 95, 46, 19, 19, 0, 19, 0.15789473684210525, 0.631578947368421, 1.01157359475851, -0.024098208995406822, 294.161880977776, 876.8550416063504, 624.6606537036129, 5.217391304347826, 270, 30, 23, 0.04411977882667759, 933, 185688.6040452648
Second simulation I ran generated...
103, 283, 3, 286, 331, 617, 646, 31, 677, 255, 289, 2, 0, 1549, 1270, 12, 0, 7, 0.26614173228346455, 1.0244697019962505, -0.002299730036721126, 320082.397005936, 250030.7937810174, 252088.47594443636, 0.0, 21.326370650946313, 95, 46, 19, 19, 0, 19, 0.15789473684210525, 0.631578947368421, 1.01157359475851, -0.024098208995406822, 294.161880977776, 876.8550416063504, 624.6606537036129, 5.217391304347826, 270, 30, 23, 0.04411977882667759, 933, 185688.6040452648
Is this the right experimental setup to replicate your results? Perhaps you are running the simulations from the command line rather than from within an IDE?
from housing-model.
@davidrpugh I am using the IDE (intelliJ IDEA), but I've also tried command line with the same results: random numbers are not consistent between runs. I'm fairly surprised by the fact you do not obtain that inconsistency! I know Donovan (one of our PhD students) is having the same problem, as he flagged it yesterday when he started working with the model, and he uses a Mac. Do you have any clue of what might be happening here?
from housing-model.
I am also using IntelliJ. I am running the model of my work computer which runs on Windows.
I want to be exactly clear about what I am doing just to be certain that we are doing the same thing.
- Right click on the Model.java file and run the main method.
- After roughly 500 time steps I stop the model.
3 click the green play button to re-run the simulation.
After roughly 500 time steps I stop the model. - I then compare the Output-run1.csv files from the relevant directories in the Results folder.
Is this what you are doing? Please try to be very specific in your description.
from housing-model.
Ouch, ok, I can confirm the problem does not appear when I follow your instructions. I have only seen the problem when:
- Running the code from the command line with
mvn exec:java -Dexec.mainClass="housing.Model"
. - Running the code with a Maven configuration within IntelliJ, with the command line argument
clean validate compile exec:java
.
For the rest (your points 2 and 3), I proceed exactly as you do.
from housing-model.
@adrian-carro This is useful information! It suggests that the issue arises at compilation time, rather than runtime. Running multiple simulations using the steps that I provided compiles the model once. Doing it your way forces compilation of the code twice.
Do you know if the configuration files are being picked up when the model is compiled or after the model is compiled?
from housing-model.
@davidrpugh The config file should be picked up only at execution, if I am not wrong. However, as I understand it, my option number 1, running the code from command line with mvn exec:java -Dexec.mainClass="housing.Model"
is not compiling the code, but just executing it. At least I had assumed so, please, correct me if I wrong! ;-)
from housing-model.
@davidrpugh I finally managed to find the origin of this problem: there was a small bug at the PriorityQueue2D which was leading to some transactions happening or not happening, as a consequence of which the amount of random numbers used would change, thus leading to the observed inconsistencies. I obtain now consistent random numbers between runs. Also, I have checked and I obtain now the same stream of random numbers whether directly running the class or using the maven execution goal. I still don't understand why these two ways of running the code were leading to different results before, but I guess they might simply have different ways of handling errors!
from housing-model.
@adrian-carro Great! Glad that you found the issue. Sounds super tedious. I don't actually think that the 2D priority queue is necessary at all to implement the matching algorithm. We might want to think about removing it and replacing it with something simpler.
from housing-model.
Related Issues (20)
- Output file with (expAv?) average sale price for each quality band HOT 1
- Rename MicroDataRecorder as TransactionRecorder, divide in rental and sale HOT 2
- Remove returns on financial wealth HOT 1
- Stabilise new desired wealth algorithm HOT 1
- Output OO and BTL housing wealth distribution HOT 1
- Turn saving rate into consumption rate HOT 1
- Implement property income tax HOT 1
- Remove adding home equity at requestApproval() HOT 1
- Move dDemand_dInterest and initial interest spread to config HOT 1
- Update Bank class HOT 1
- Update CentralBank class HOT 1
- Hard-coded parameter values with command line update
- Remove null mortgage checking at Household.completeHousePurchase() HOT 1
- Add individual LTVs and LTIs as output HOT 2
- Save file space by controlling number printing format HOT 1
- Replace DescriptiveStatistic objects by ArrayList objects HOT 1
- Split transaction data into separate files for rental and sales HOT 1
- Develop measures for all core indicators still lacking HOT 1
- Update calibration files HOT 1
- How can I run the code in Python? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from housing-model.