Git Product home page Git Product logo

sms-backup-and-restore-extractor's Introduction

app-store-listing-image

About

The SMS Backup & Restore app on Google Play (official name: com.riteshsahu.SMSBackupRestore) allows you to backup:

  • your entire SMS history (including attachments)
  • your whole or partial call history
  • your Contacts ('Address book')

They provided an online tool to let you view the content of these backups: https://www.synctech.com.au/sms-backup-restore/view-backup/ But that tool doesn't have an easy way to extract the data, which is where this repository comes in.

I wrote a Python script to extract data from those backups. Right now the script can:

  1. extract the images out of your sent/received MMS messages
  2. create a de-duplicated call log out of your entire call history
  3. 🚧 (NEW) 🚧 extract any saved images, media, or keys from a user's contact files. ie. you chose a custom photo for someone in your contacts, made a backup, and now would like to retrieve that contact photo.

Details

Messages & Calls: Backup format

The app saves backups of your SMS messages, images included, in an xml file that looks like sms-<timestamp>.xml. The images that were stored as MMS messages are then encoded as Base64.

This script searches for XML files for MMS messages, then decode the data from them to convert to regular images.

Calls are also saved in XML files, named calls-<timestamp>.xml. This script creates a CSV file out of all of your call backups, while accounting for duplicates.

V-Card/VCF parser

This app also lets you backup your contacts to one large VCF file. There are 3 different standards for V-Card files, but thankfully this parser supports all 3: version 2.1, version 3, and version 4.

Any of the following Vcard tags:

  • PHOTO
  • SOUND
  • LOGO
  • KEY

will be either downloaded (if they're stored) as a URL, or otherwise decoded (from Base64).

Usage

Prerequisites

  • Python 3 (tested on Python 3.10.4)
  • LXML

Steps

  • Make sure the backups files start with either sms- or calls-, have the .xml extension, and are in their own directory.
  • Make sure the contacts files all end in .vcf and are in their own directory

Usage

usage: backup_extractor.py [-h] [-i INPUT_DIR] [-t BACKUP_TYPE] [-o OUTPUT_DIR]

options:
  -h, --help            show this help message and exit
  -i INPUT_DIR, --input-dir INPUT_DIR
                        The directory where XML files (for calls or messages) are located
  -t BACKUP_TYPE, --backup-type BACKUP_TYPE
                        The type of extraction. Either 'sms' for message images or 'calls' to create a call log, or 'vcf' to extract media from a VCF/Vcard file
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        The directory where media files that are found, will be extracted to

Output info

  • For extracting images from SMS backups only: if the metadata of the MMS message included a filename, then that will be used for the output, otherwise a random 10-letter filename will be created. At the end, duplicates will be removed.

  • For creating call log, a file named call_log.csv will be created, that looks like:

Call Date (timestamp),Call date,Call type,Caller name,Caller #,Call duration (s),Call duration,Call Id #
1451965221740,"Jan 4, 2016 7:40:21 PM",Incoming,Dad,+18183457890,65,"1 minute, 5 seconds",0
1452020364934,"Jan 5, 2016 10:59:24 AM",Missed,(Unknown),+11234560987,N/A,N/A,1
1452107940226,"Jan 6, 2016 11:19:00 AM",Incoming,Michael Jordan,+11234567890,194,"3 minutes, 14 seconds",2
  • For extracting images from VCard files only: the user's name will be stored in the filename. If no name is present then a random 10-letter filename will be used.

Limitations

  • The image portions of the backup don't contain date information associated with them, so it's impossible to determine when an image was created

  • EXIF data is lost when restoring images

The backups I had only contained image data, not audio or videos. I don't know if that's because there were no video sent, or because the app didn't backup messages with audio or videos in them

Future Roadmap

  • Refactoring of the VCard/VCF parser
  • Add the ability to convert export messages to a CSV file

Contributing

Buy Me A Coffee

I haven't used this backup application since 2016, so its possible some of the schema might've changed. If you encounter an issue please include the date your backup was generated.

sms-backup-and-restore-extractor's People

Contributors

raleighlittles avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sms-backup-and-restore-extractor's Issues

name attribute should be cl

thanks for the script helped me recover some memories i would have otherwise lost.

the name attribute does not properly pull the identified name for the media file. using cl will do the trick. changing as follows helped me pull original names out of the xml. i just substituted name for cl in this code block.

        for ext in image_ext_types:
            xpath_search_expr = xpath_search_str_base + ext + "']"
            b64_results_list.append([(b.attrib['data'], b.attrib['cl']) for b in root.findall(xpath_search_expr)])

        for result_type in b64_results_list:
            for (data, cl) in result_type:
                if cl == "" or cl == "null":
                    cl = "".join(random.sample(string.ascii_letters, 10))
                with open(os.path.join(output_dir, cl), 'wb') as f:
                    f.write(base64.b64decode(data))
                    orig_files_count += 1

Script only extracts a small portion of images from large (3 GB) file

Using this with a 3.7GB smsBackupAndRestore file and it is only restoring 32 images. There should be hundreds as verified by loading the file into https://www.synctech.com.au/sms-backup-restore/view-backup/

i am using ubuntu 20.04 with Python 3.8.10 and LXML 5.2.2. Here is the command used and the output:

08:11 [alex@kite] ~/sms ┤ pip3 show lxml | grep Version
Version: 5.2.2

09:43 [alex@kite] ~/sms ┤ python3 --version
Python 3.8.10

09:44 [alex@kite] ~/sms ┤ ls
backupExtractor  output  sms-20240709093820.xml

10:31 [alex@kite] ~/sms ┤ python3 backupExtractor/backup_extractor.py -i /home/alex/sms -t sms -o /home/alex/sms/output
32 files created... Automatically removing duplicates
0 files removed

any insight as to how to troubleshoot this would be appreciated. thanks for the wonderful tool!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.