Sometimes, we need to query external databases using only SQL. One of the common use cases when this occurs is calculating multi-hop mobility.
What is multi-hop mobility?
To better understand multi-hop, imagine 3 users (u1, u2, and u3) and 3 cellphone towers that they might ping off of (A, B, and C).
Over the course of a day, the first user may travel in such a way, that when using their phone, they ping the following towers:
AAAA BBB C A
Additional user trips might be:
u2: C A BB
u3: CC BBB A C
How do you calculate multi-hop mobility?
To calculate the mobility for each user user, we need to calculate all trips between areas, but not count "self-loops", or when the start and end towers are the same (i.e. A -> A).
First, we would simplify the trips to remove self-loops, finding their basic path in chronological order. For example, u1: A, B, C, A
Next, we need to calculate all permutations of trips among these locations:
- Single-hop trips: A -> B, B -> C, C -> A
- Two-hop trips: A -> C, B -> A
- Three-hop trips: A -> A (this can be thrown out, as it is a self loop)
If there were more trips, perhaps if the path was: A, B, C, A, C, A, B, we would continue until all hops are recorded.
SQL query should ultimately return either a table with only the hops, and no user information, such as:
- AB
- BC
- CA
- AC
- BA
and so on for each user, or a total, aggregated count of each possible hop, for example:
- AB - 52
- CB - 23
- BC - 10
- AC - 15
- CA - 139
The database could be hundreds of thousands to millions of lines, so the query should be efficient. It could be broken down into parts and use temporary tables, for example, so portions of the query can be run at a time to minimize chances of timing out. Variable names should be clear, so that this can be easily adapted to multiple different databases.
For testing purposes:
The columns are
sim_id : identifier of sim card
timestamp : datetime dd-mm-yyy hh:mm:ss, where hh is given in 24 hour format
event : the type of event that generated this row, can be sms, call, data
site_id : antenna identifier
The activity sequences are:
u1: AAAA BBB C A
u2: C A BB
u3: CC BBB A C
The simplified sequences are:
U1: A B C A
U2: C A B
U3: C B A C
The mobility results should be:
AB: 2
AC: 2
BA: 2
BC: 2
CA: 3
CB: 2
Resources
SQL online sandbox: http://rextester.com/l/sql_server_online_compiler