idas-durham / june Goto Github PK
View Code? Open in Web Editor NEWJune is a framework for agent based modelling in an epidemiological and geographical context.
License: GNU General Public License v3.0
June is a framework for agent based modelling in an epidemiological and geographical context.
License: GNU General Public License v3.0
I am having some troubles to set up tests in which I have control on some of the properties. For example, I want to test what happens when I send a person to hospital depending on their health_information.tag value. However, I can not simply set that value to what I'd like, is their a simple solution for this?
Right now the files reformat.py and inputs.py are complete Frankensteins. I'd like to refactor them following this logic:
I will change the data folder structure. There will be two subfolders: data/raw/ and data/processed/ . They will be mirror images of each other. All the data directly downloaded will go into data/raw, after we clean it up and process it, it will move to data/processed/ . Bare in mind that data raw can be as specific to our UK scenario as we want it to be, however when we process data we need to think about simplifying inputs and generalising to other countries.
The scripts used to clean data from raw to processed, will be stored at covid/reformat_scripts/ , whose structure will also be a mirror image of data/raw/ .
inputs.py will follow the same logic as world.py . The init will be mostly empty, and each group will have a function like get_inputs_households or get_inputs_companies , that will read in all the necessary datasets from data/processed . When world initialises the different active groups, it will first call the function needed to get the data for that particular group using inputs.py
Currently, we initialize people with an age taken form a uniform distribution inside their age bin given by the NOMIS dataset. This should be improved using more fine-grain data, especially for the last population bin, where it is likely to have a sharp negative slope.
class CommuteGenerator:
def __init__(
self,
regional_generators: Dict[str, RegionalGenerator]
):
"""
Generate a mode of transport that a person uses in their commute.
Modes of transport are chosen randomly, weighted by the numbers taken
from census data for each given Output area.
Parameters
----------
regional_generators
A dictionary mapping Geography msoareas to objects that randomly
generate modes of transport
"""
self.regional_generators = regional_generators
def regional_gen_from_msoarea(self, msoarea: str) -> RegionalGenerator:
"""
Get a regional generator for an Output Area identified
by its output msoarea, e.g. E00062207
Parameters
----------
msoarea
A code for an msoarea
Returns
-------
An object that weighted-randomly selects modes of transport for the region.
"""
return self.regional_generators[
msoarea
]
@classmethod
def from_file(
cls,
filename: str,
config_filename: str = default_config_filename
) -> "CommuteGenerator":
"""
Parse configuration describing each included mode of transport
along with census data describing the weightings for modes of
transport in each output area.
Parameters
----------
filename
The path to the commute.csv file.
This contains data on the number of people using each mode
of transport.
config_filename
The path to the commute.yaml file
Returns
-------
An object used to generate commutes
"""
regional_generators = dict()
with open(filename) as f:
reader = csv.reader(f)
headers = next(reader)
msoarea_column = headers.index("geography code")
modes_of_transport = ModeOfTransport.load_from_file(
config_filename
)
for row in reader:
weighted_modes = list()
for mode in modes_of_transport:
weighted_modes.append((
int(row[
mode.index(headers)
]),
mode
))
msoarea = row[msoarea_column]
regional_generators[msoarea] = RegionalGenerator(
msoarea=msoarea,
weighted_modes=weighted_modes
)
return CommuteGenerator(
regional_generators
)
I'm sensing a theme with my git issues.
Feels to me like this is another example of something that doesn't need to be a class, but can just be stored as a list of dictionaries wherever it is used? Again, we'd need to move the config setup somewhere else.
Will tackle this once we have test coverage over the whole code.
parameters.py seems to load configuration every time an object is instantiated. This causes an excessive number of IO operations which is making things run very slowly.
As part of a PR I've added a caching mechanism which shaves 10 seconds off the fullrun test.
It would be better if configuration were loaded then passed to objects to instantiate them. That way:
A dictionary loaded from YAML can be used to instantiate a class in a fairly concise way.
cls(**dict)
In this case the dictionary keys must at least include every required argument and not include any arguments that don't exists. I'd argue this is safer, cleaner and more concise then looping through the keys and values in a dictionary and calling setattr.
class ModeOfTransport:
__all = dict()
def __new__(cls, description):
if description not in ModeOfTransport.__all:
ModeOfTransport.__all[
description
] = super().__new__(cls)
return ModeOfTransport.__all[
description
]
def __init__(
self,
description
):
self.description = description
def index(self, headers: List[str]) -> int:
"""
Determine the column index of this mode of transport.
The first header that contains this mode of transport's description
is counted.
Parameters
----------
headers
A list of headers from a CSV file.
Returns
-------
The column index corresponding to this mode of transport.
Raises
------
An assertion error if no such header is found.
"""
for i, header in enumerate(headers):
if self.description in header:
return i
raise AssertionError(
f"{self} not found in headers {headers}"
)
def __eq__(self, other):
if isinstance(other, str):
return self.description == other
if isinstance(other, ModeOfTransport):
return self.description == other.description
return super().__eq__(other)
def __hash__(self):
return hash(self.description)
def __str__(self):
return self.description
def __repr__(self):
return f"<{self.__class__.__name__} {self}>"
@classmethod
def load_from_file(
cls,
config_filename=default_config_filename
) -> List["ModeOfTransport"]:
"""
Load all of the modes of transport from commute.yaml.
Modes of transport are globally unique. That is, even if the function
is called twice identical mode of transport objects are returned.
Parameters
----------
config_filename
The path to the mode of transport yaml configuration
Returns
-------
A list of modes of transport
"""
with open(config_filename) as f:
configs = yaml.load(f, Loader=yaml.FullLoader)
return [
ModeOfTransport(
config["description"]
)
for config in configs
]
Is there any reason that this needs to be a class as opposed to a string? Looking within the commute module it seems a bit overkill to make this a class, assuming that:
Once I'm more familiar with the code-base I'll have a go at sorting this.
There are some cases that might slip through the if-statements in this routine:
https://github.com/JosephPB/covidmodelling/blob/35e6f0cb73863ecce28c84aca82d2b42d19a44f3/covid/groups/companies/company_distributor.py#L40
E.g. all companies in a industry sector can already be full in a certain MSOArea and therefore a employable person won't be placed into a company. Therefore we need to place them thoughtfully into another company. Or we could think about leaving them unemployed if it mirrors the Census data.
Currently tests are failing all the time. Ideally nothing should be merged into master without all tests passing - this means people can rely on master being a working version of the project.
One way to prevent failing tests on master is to add branch protection rules where all checks must pass before merging is allowed. This way pull requests with broken tests cannot be merged until they are fixed.
In this line of the code:
https://github.com/JosephPB/covidmodelling/blob/35e6f0cb73863ecce28c84aca82d2b42d19a44f3/covid/groups/people/person_distributor.py#L298
we currently bind people to work inside simulated region. Thus we over-write their real-life MSOArea of work so that we can use them in companies within the simulation.
Can you think of a fair argument to change this and implement it?
Hi,
I noticed the list comprehension in distribute_non_mandatory_kids_to_school
to consume about 12 % of the run time when running over the NorthEast dataset.
n_pupils_age = len( [ pupil.age for pupil in school.people if pupil.age == person.age ] )
I wonder if the counting could not be accelerated by adding a dictionary atribute to School, such as
`
School(Group):
...
self.age_min = age_min
self.age_max = age_max
self.age_structure = {a:0 for a in range(age_min, age_max+1)}
and then reading from it and incrementing it in
distribute_non_mandatory_kids_to_school` as
such:
`
...
for i in range(0, self.MAX_SCHOOLS): # look for non full school
school = self.closest_schools_by_age[person.age][i]
# check number of students in that age group
n_pupils_age = school.age_structure[person.age]
if school.n_pupils >= school.n_pupils_max or n_pupils_age >= (
school.n_pupils_max / (school.age_max - school.age_min)
):
schools_full += 1
else:
break
if schools_full == self.MAX_SCHOOLS: # all schools are full
continue
else: # just keep the school saved in the previous for loop
pass
school.people.append(person)
school.age_structure[person.age] += 1
`
Shall we include option to create more verbose output to debug world creation.
person sex is currently 0 for a man and 1 for a woman owing to the original dataset. When this data is parsed it ought to be converted to something human readable to reduce the chance of mistakes.
Specially classes that are now used for infection (transmission, symptoms, infection and interaction)
The function create_two_parent_household should take into account the age of the parents, and match them with a certain Gaussian width, for example.
The refactor of Symptoms / Transmission assumes the time is passed into their functions, and the Infection class now tracks when the infection begins (it stores a start time) allowing it to pass delta_t to the symptoms / transmission methods.
We need to pass the actual time in the world to Infection so the methods below can create new infections correctly.
def infect_person_at_time(self, person, time):
"""Infects someone by initializing an infeciton object with the same type
and parameters as the carrier's infection class.
Arguments:
person (Person) has to be an instance of the Person class.
"""
infection = self.new_infection_at_time(time=time)
person.health_information.set_infection(infection=infection)
def update_at_time(self, time):
if self.last_time_updated <= time:
delta_time = time - self.start_time
self.last_time_updated = time
self.transmission.update_probability_from_delta_time(delta_time=delta_time)
self.symptoms.update_severity_from_delta_time(delta_time=delta_time)
self.infection_probability = self.transmission.probability
From what I can tell, this requires us to simply pass the time of the timer down from the update_status_lists() methods that are called in world. I'll implement this behavior tomorrow if needs be alongside attempting to simplify the Interaction class.
If we set 3 timesteps of 8 hours in the config file we get the following error
File "time.py", line 46, in get_shifts_duration
return self.time_config['step_duration'][self.type_day][self.shift+1]/24.
KeyError: 4
I think it's still the problem with the floating point precision, i.e. int(1.9999999999999998) = 1
The way that we're updating groups is slow because it has to do this list comprehension every time we call this function. It's much slower than the way we were doing before, for example, running on just box mode for 10000 people takes hours rather than minutes.
Maybe it would be better to store this list somewhere rather than calculating it every time?
In households, contact matrices should reflect the structure of a household. If the parents are older, they should stay have the same number of contacts with their children.
We need to decompose groups into subgroups that represent different functions in the group and use this in our interaction models.
There is data on different methods of commuting that can be assigned to people in the person_distibutor.py
and added to each person as an attribute.
An example of how the data on commuting is read in is here, but can be edited:
The ParameterInitialize is used to set up the parameters of model components (e.g. Transmission):
class Transmission(ParameterInitializer):
def __init__(self, timer, user_parameters, required_parameters):
super().__init__("transmission", user_parameters, required_parameters)
self.timer = timer
if timer != None:
self.infection_start_time = self.timer.now
self.last_time_updated = self.timer.now # for testing
self.probability = 0.0
Via yaml config files. It would be better if this class were instantiated with the parameters as input for its constructor, e.g.:
class Transmission:
def __init__(self, timer, transmission_probability=0.1):
self.timer = timer
self.transmission_probability = transmission_probability
if timer != None:
self.infection_start_time = self.timer.now
self.probability = 0.0
Assuming the transmission probability is treated as a constant, there seems no obvious reason to me why it needs to be loaded via yaml configs and the ParameterInitializer. This is just making the class difficult to change and test and depend on unnecessary config files.
The main purpose of the ParameterInitiailizer seems to be for setting up the class with the input parameters (e.g. transmission_probability) not as constant values (e.g. floats) but instead as parameters defined by distributions (Uniform, Gaussian).
Am I correct in thinking that this set up of the parameters is so that they can be set up with 'random values' drawn via the priors for model-fitting (e.g. using a non-linear search). Put another way, once these parameters have physical values (i.e. floats drawn via the distributions) those values are then used for a full simulation (e.g. creating a world, running the simulation, tracking how the infection spreads, etc)?
If this is the case I can simplify all the model classes (transmission, symptons, interaction) by interfacing the code with a probablsistic programming language (I will explain this in more detail in a telecom).
So, I guess my question is are these things distributions for a different reason than the one I described above? Or are they set up as distributions for some aspect of the world simulation that I'm missing?
Atm. there are parts of the code that uses a .id attributed to label a object and some that uses .name We should use the same convention throughout and thus decide if we think "the short the better" or otherwise.
In covid/groups/areas/area.py
the commute_generator calls for_code
, but that function seems to not have been implemented. I have commented out everything relating to commuting.
Comment here with a suggestion for the code's name.
RULES:
Vote by giving a thumbs up.
Deadline: Monday 4th of May, 11 am UK time.
Reward: Two bottles of some of the finest spirits, courtesy of @FrankKrauss and @rgbower .
If I run all the tests together then test_fullrun passes. If I run only this test it fails.
The reason it fails is that it tries to assign a person to hospital but it can't because hospitals don't get created unless box_mode=True.
I assume hospitals should be created regardless of mode?
The inconsistency in whether people are assigned to hospital or not is likely to be due to some file or configuration that is loaded using a path relative to the current working directory. This should be made relative to a file such that tests have the same behaviour regardless of where they are run from.
CollectiveInteraction
InfectionSelector
Are both missing so tests aren't passing. Looks like Maybe CollectiveInteraction was renamed as there is an InteractionCollective class? Although that is missing some methods called in tests.
We can see how the R0 in our logger when running our model in box mode compares to the R0 expected for an SIR model. Given that we could do a one to one comparison with our box and the SIR, these should agree.
Check qualitatively how the disease spreads in households only, school only etc... for different interactions and different parameters
Assign people tagged as healthcare workers to hospitals following flow data
Allocate workers with the teacher tag to schools according to flow data
This issue is mostly independent of the rest of the code. It involves understanding contact matrices, extracting them from the BBC pandemic survey paper (https://www.medrxiv.org/content/10.1101/2020.02.16.20023754v2), they can be found at the Supplementary material section. Very importantly, they do not include children younger than 14 as participants of the survey, we need to somehow extrapolate the rest of the matrix. Potentially it could be combined with POLYMOD, as they do in the BBC pandemic paper.
The following line shows you that we skip adding 18-19 yr olds to schools if they are already over their capacity.
https://github.com/JosephPB/covidmodelling/blob/6d867421001c342cf29bb2d42dbcba6c31f6b0bd/covid/groups/schools/school_distributor.py#L66
This means they would have nothing to do and stay at home all day 'n' every day. So let's try to give them work by including as done here:
https://github.com/JosephPB/covidmodelling/blob/6d867421001c342cf29bb2d42dbcba6c31f6b0bd/covid/groups/people/person_distributor.py#L271
and in the functions _assign_industry_specific
and _assign_industry
in the same file further down.
This means we need to include these functions in the school_distributor.py file to change the person attributes.
Here:
https://josephpb.github.io/covidmodelling/code_structure.html
It stipulates we use setters and getters because of porting to C++. Does this help with some kind of automated porting?
If not I'd be inclined to adopt a more pythonic syntax.
You can use an under score prefix to indicate privacy:
self._attribute = value
Whilst properties can the be used to make the attribute public, but apparently immutable:
@property
def attribute(self):
return self._attribute
Finally, to make it fully gettable and settable we can use a property setter:
@attribute.setter
def attribute(self, attribute):
self._attribute = attribute
I've changed the python in world.py to replace the non-existent config_companies.yaml (already assigned an issue) with config_example.yaml. Code crashes:
frank@frank-XPS-15-9570:~/covidmodelling/covid$ python3 world.py
Initializing world...
Reading inputs...
Traceback (most recent call last):
File "world.py", line 388, in
world = World(config_file=os.path.join("..", "configs", "config_example.yaml"))
File "world.py", line 33, in init
self.inputs = Inputs(zone=self.config["world"]["zone"])
File "/home/frank/covidmodelling/covid/inputs.py", line 60, in init
np.unique(self.oa2msoa_df["MSOA11CD"].values)
File "/home/frank/covidmodelling/covid/inputs.py", line 150, in read_companysize_census
header=0,
File "/home/frank/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/frank/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/frank/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 880, in init
self._make_engine(self.engine)
File "/home/frank/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/frank/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1891, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File /home/frank/covidmodelling/covid/../data/census_data/middle_output_area/EnglandWales/companysize_msoa11cd_2019.csv does not exist: '/home/frank/covidmodelling/covid/../data/census_data/middle_output_area/EnglandWales/companysize_msoa11cd_2019.csv'
globals() is used so a class can be instantiated by name. This is bad practice because it means removing an apparently unused import breaks the code. It also violates the philosophy of making behaviours explicit in python.
It would be better to define or generate a dictionary explicitly mapping class names to classes.
Households currently only have certain compositions, e.g. single parent, two parent, student or elderly couple. This means that inter-generational mixing does not occur over more than one generation, and that is only in cases where children live with their parents.
It would be good to create another 'generic' household function that looks at the data distributions and matches people across 2 generations (children living with parents and some grandparents or sub-sets thereof).
The starting point for this would be to look at the household data currently used and how such distributions could be drawn.
Distributing households can be found here:
Add unit tests to check that people are rightly moved to hospitals, and de-activated from other groups
Use household composition dataset (Nomis QS113UK) to define a "minimal" configuration of households.
Use remaining people to populate households in which they could potentially be in. This will allow multi-generational households.
Match age of couples given https://en.wikipedia.org/wiki/Age_disparity_in_sexual_relationships , which is almost identical to the UK one, and can be more generally use for other countries.
Match age of mother and child given by https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/birthsbyparentscharacteristics . The same dataset also allows for matching siblings age.
Create care homes using https://www.nomisweb.co.uk/census/2011/qs421uk . First populate them with the oldest people in that output area.
Use Nomis tables QS406EW (household size) and QS110EW (household composition per age) as a test of our algorithm.
Test both the collective (as in Imperial and our own) and contact matrix interaction in the box setting. If possible, compare with SIR model fit. Specially check that the R0 in the logger has a correspondence with SIR parameters.
So, to get rid of the data.zip annoyances, we should set up git lfs, as we did last time.
The pricing seems to be 5$ for every 50 GB of bandwidth we use per month.
Currently, it is hard to specify exactly which geography you want on the code, and you always need to initialize world. As a first step, one should be able to initialize just the "geography" part of the code like this
area1 = Area.from_name("E0000XXX")
area2 = Area.from_name("E1231321")
area3 = Area(n_residents=50, coordinates=....)
areas = Areas([area1, area2, area3 ])
geography = Geography(areas)
world = World(geography = geography)
Of course it would also be possible to do
geography = Geography.from_region("NorthEast")
which would initialize all the output areas and super areas in the north east.
I will start working on this as it is needed to make other things work and testing feasible.
if people are to be hospitalised, move them to the nearest hospital with capacity in their region.
Adapt the code to people possibly staying at home if they have symptoms (perhaps with a certain probability configured in the config file?), and send them to hospital if necessary (so that they are not active inside the household)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.