computationalprivacy / bandicoot Goto Github PK
View Code? Open in Web Editor NEWan open-source python toolbox to analyze mobile phone metadata
License: MIT License
an open-source python toolbox to analyze mobile phone metadata
License: MIT License
The recharge parsing function in io.py, line 191 "%Y-%m-%d" should be like "%Y-%m-%d %H:%M:%S", otherwise, the datetime cannot be loaded correctly. Or, all the recharge will be categorized in xxx_recharges__xxx__night__xxx because the time will be automatically encoded as midnight.
def _parse_recharge(data):
dt = _tryto(lambda x: datetime.strptime(x, "%Y-%m-%d"),
data['datetime'])
The version of the file 'io.py' that is installed when using pip does not have the function optional_parser. This function (line 190) is in the version of the files available in GitHub.
Not having the optional_parser function makes that some functions on the recharge module do not apply, as the recharge activity only accepts day information but not for a timestamp.
Hi,
I am using FnF data to generate user's features. However, for all the features (bc.utils.all(user)
) i observed that the has_calls
is always false
and all features related to calls were null
. While debugging i found that there are 33 out of 53 FnF users (with personality results) had calls with call duration not null
.
e.g. for user "sp10-01-19" out of 275 results there are many calls with duration not null but still the bandicoot imports it this way and the reporting-has_C
is always false
`sp10-01-19
[x] 52 records from 2010-07-24 09:42:35 to 2011-05-09 10:01:22
[x] 11 contacts
[ ] No attribute stored
[ ] No antenna stored
[ ] No recharges
[x] Has home
[x] Has texts
[ ] No calls
[ ] No network`
What should i do?
The average_balance_recharges in recharge.py, line 89.
return balance / (last_recharge.datetime - first_recharge.datetime).days
If the first and last recharge happen on the same day, the denominator will be 0. Maybe the denominator should plus 1, like this:
return balance / ((last_recharge.datetime - first_recharge.datetime).days + 1)
First of all, this is a great python module, so thank you.
In my case I want to apply it (in particular the bc.utils.all function) to a muslim country. It all works fine, besides the fact that the weekend is actually on Friday and Saturday (so day [5,6] and not [6,7]). Is there an easy way to integrate it into the function?
Thanks in advance
fabibru
@yvesalexandre
Why is Iframe in the code block below not showing?
import os
bc.visualization.export(U, "GA")
Successfully exported the visualization to GA
'GA'
IFrame("GA/index.html", '100%', 700)
This code block is also not showing the ego network:
import os
viz_path = os.path.dirname(os.path.realpath(name)) + '/viz'
bc.visualization.export(U, viz_path)
from IPython.display import IFrame
IFrame("/files/viz/index.html", "100%", 700)
Someone please help
in line 116 of helper/group
return (d.year, d.day, d.hour, d.minute // 30)
should't it be
return (d.year, d.month, d.day, d.hour, d.minute // 30)
or I'm missing something ?
Hello,
I am trying to compute indicators for a user and have put calls/antennas into the format listed in the quick start docs. Here is the antennas file
For some reason I can't pin down, these antennas' locations are not recognized.
When I query the antennas, I still see there are attributes however:
Any idea what might be happening here?
In bandicoot/io.py
line 383, you use the place_id
key.
antennas = dict((d['place_id'], (float(d['latitude']), float(d['longitude'])))
On line 118 in the same file:
elif 'place_id' in data:
raise NameError("Use field name 'antenna_id' in input files. 'place_id' is deprecated.")
So in the antennas file, you want us to use place_id
in the header, but in the records file you want us to use antennas_id
. In the example in the docs, your header for the antennas file uses antennas_id
as a field.
Should place_id
be replaced by antennas_id
everywhere?
Hi there,
I've trying bandicoot app and documentation, it looks really impressive. Nevertherless, I've been able to create metadata.csv file with my own mobile phone call/text data but antennas information is missing (antenna_id column is empty).
Any idea why is happening this?
Thanks in advance
Hi. I am trying to use Bandicoot to analyze Wifi data usage, the problem is very similar so the mapping between variables is easy. I am confused by the correspondent_id
field on the .csv
. On the getting started you start by loading /data/records/A.csv
and state
bandicoot uses one record file per user. Record files are structured as follows:
From this I inferred that all the data in A.csv
is about user A
. When looking at correspondant_id
at first I assumed that it was the id of the user, but now I come to realize that it doesn't make sense since in this example you see different correspondante_id
's, but they should all be the same if the data is about the same user.
So... what is really correspondent_id?
Simple bug in bandicoot/helper/tools.py
on lines 326 to 329 where /
is integer division assuming both operands are int
, which isn't usually the case. Fix by enforcing pt1
and pt2
to be float
.
Probably doesn't affect normal use (as location points won't usually be int
), but looks like this has led to incorrect test conditions in bandicoot/tests/test_utils.py
on lines 77 to 82 since all the test points are int
.
I was trying to test out data collection by downloading own data using the android app however could not find the app on google playstore. Is the app not available anymore?
https://play.google.com/store/apps/details?id=edu.mit.media.bandicoot
After loading a file with overlapping values I'm getting warnings like below:
´WARNING:root:{0:.2%} of calls overlap the next call by more than 5 minutes.´
trying to achieve clean Full pipeline using pyspark
TypeError Traceback (most recent call last)
in ()
----> 1 errors = bc.read_csv(error_user_id, records_with_errors_path, drop_duplicates=True)
TypeError: read_csv() got an unexpected keyword argument 'drop_duplicates'`
In individual.py, the function percent_pareto_interactions is computed with:
return (len(user_count) - len(user_sort)) / len(records)
I do not understand why we divide by len(records). Shouldn't it be divided by len(user_count) given that we are computing the percentage of the user's contacts?
The line would thus be:
return (len(user_count) - len(user_sort)) / len(user_count)
Apologies if I have missed something..
P
I found an issue within the - bandicoot/bandicoot/helper/group.py at line 183. That line needed to be commented out for it to work within metrics.all ().
Because with the current code, I get the following error:
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/neetipokhriyal/Research/opalalgorithms/src/bandicoot/bandicoot/helper/group.py", line 181, in group_records_with_padding
pointer = next(_range)
StopIteration
With that line commented out, there is no error.
Could be a versioning issue too...I use python 3.6.
Thanks.
I saw the demo in the notebook but I can't find documentation on the graphing functionalities. I wrote same code as the demo but with my data and the index.html
's are not being generated after I run these lines
bc.special.demo.export_antennas(U, 'viz/mobility_view')
bc.special.demo.export_transitions(U, 'viz/mobility_view')
bc.special.demo.export_timeline(U, 'viz/event_timeline')
bc.special.demo.export_network(U, 'viz/network_view')
I did create those folders because I was getting errors. But now there are no errors but also no graphics. I'd appreciate help with this :)
I am working on a cluster which only supports Python 2.6.6 and bandicoot is coded in python 3 .
I was wondering whether it is more efficient to change some parts and do backports or take the old bandicoot version (2.6) and add the new indicators I need for my model (present in the new bandicoot version, not the old one). Could you please tell me which option seems the most suitable ? I would really appreciate.
I think the issue lies in io.py
in the schema definition in line 230
'direction': (not_callandtext and r.direction is None) or r.direction in ['in', 'out'],
should be
'direction': (not_callandtext and r.direction in [None, '']) or r.direction in ['in', 'out'],
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.