Git Product home page Git Product logo

firebase-streaming-import's Introduction

Firebase Streaming Import

  • Utilizes ijson python json streaming library along with requests to import a large json piecemeal into Firebase.

  • This is a two-pass script. Run it once in normal mode to write all the data, then run it again in --priority-mode to write priority data.

  • Defaults to 8-thread parallelization. Tweak this argument for your own best performance.

  • Repeats efforts already done in firebase-import, however firebase-import doesn't handle large json files well. Node runs out of memory. This script streams in data so there are no limits, however it might not be as fast or efficient as the other one.

  • Root of tree does not need to be empty, since we make REST PATCH calls

  • Speed: about 30 seconds/mb, for datasets with many small leaf values. Performance improves when leaves have larger values.

Requirements:

  • run pip install -r requirements.txt
  • May need to do pip install pp --allow-unverified pp in order to install the pp module
usage: import.py [-h] [-a AUTH] [-t THREADS] [-s] [-p] firebase_url json_file

Import a large json file into a Firebase via json Streaming. Uses HTTP PATCH
requests. Two-pass script, run once normally, then again in --priority_mode.

positional arguments:
  firebase_url          Specify the Firebase URL (e.g.
                        https://test.firebaseio.com/dest/path/).
  json_file             The JSON file to import.

optional arguments:
  -h, --help            show this help message and exit
  -a AUTH, --auth AUTH  Optional Auth token if necessary to write to Firebase.
  -t THREADS, --threads THREADS
                        Number of parallel threads to use, default 8.
  -s, --silent          Silences the server response, speeding up the
                        connection.
  -p, --priority_mode   Run this script in priority mode after running it in
                        normal mode to write all priority values.

firebase-streaming-import's People

Contributors

m-tse avatar mtsegoog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

firebase-streaming-import's Issues

import an arraylist of items

Thank you so much for providing such an awesome tool.

My issue is that the large json file I have consists of a list of jsons
[ {...} , {...} , {...} , {...} ]
And when I try to import this json file to Firebase using your tool, each item in the list gets the same key
["item" :{..}, "item" : {..}, "item" : {..} ]
instead of using indices
["0" : {..} , "1" : {..} , "2" : {..} ]

So, what happens when I use your tool is the first item in the array list gets created and then the tool just keeps changing the first item's values with values from the next item as it goes through the list

ImportError: No module named ijson

On running the import I get the following error

Traceback (most recent call last):
File "import.py", line 1, in
import ijson
ImportError: No module named ijson

Note: During the install I did get the following error as well
$ pip install -r requirements.txt
Collecting requests (from -r requirements.txt (line 1))
Downloading requests-2.18.4-py2.py3-none-any.whl (88kB)
100% |████████████████████████████████| 92kB 875kB/s
Collecting argparse (from -r requirements.txt (line 2))
Downloading argparse-1.4.0-py2.py3-none-any.whl
Collecting ijson (from -r requirements.txt (line 3))
Downloading ijson-2.3-py2.py3-none-any.whl
Collecting traceback (from -r requirements.txt (line 4))
Could not find a version that satisfies the requirement traceback (from -r requirements.txt (line 4)) (from versions: )
No matching distribution found for traceback (from -r requirements.txt (line 4))

I removed the traceback line and tried a re-install and then get the 2nd error
Installing collected packages: certifi, chardet, idna, urllib3, requests, argparse, ijson
Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 784, in install
**kwargs
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/wheel.py", line 345, in move_wheel_files
clobber(source, lib_dir, True)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/wheel.py", line 316, in clobber
ensure_dir(destdir)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/init.py", line 83, in ensure_dir
os.makedirs(path)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/Library/Python/2.7/site-packages/certifi-2017.11.5.dist-info'

Progress indication

The only outputs from this script are the start, end and error states. If it's streaming, could you print a time estimation or some kind of progress?

I'd like to send a PR, but my python is pretty rusty.

MemoryError when importing 484mb JSON File

When running

python import.py https://necir-hackathon.firebaseio.com/ test.json

I get this stacktrace:

started at 1469480152.98
Traceback (most recent call last):
  File "import.py", line 90, in <module>
    main(argParser.parse_args())
  File "import.py", line 20, in main
    for prefix, event, value in parser:
  File "R:\Python27\lib\site-packages\ijson\common.py", line 65, in parse
    for event, value in basic_events:
  File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 185, in basic_parse
    for value in parse_value(lexer):
  File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 116, in parse_value
    for event in parse_array(lexer):
  File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 138, in parse_array
    for event in parse_value(lexer, symbol, pos):
  File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 119, in parse_value
    for event in parse_object(lexer):
  File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 160, in parse_object
    pos, symbol = next(lexer)
  File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 59, in Lexer
    buf += data
MemoryError

Test.json is 484 mb, but shouldn't that not be a problem?

Installation error

scott@classy ~/workspace/firebase-streaming-import master
 [31] → pip install -r requirements.txt
Collecting requests (from -r requirements.txt (line 1))
  Using cached requests-2.11.1-py2.py3-none-any.whl
Collecting argparse (from -r requirements.txt (line 2))
  Using cached argparse-1.4.0-py2.py3-none-any.whl
Collecting ijson (from -r requirements.txt (line 3))
  Using cached ijson-2.3-py2.py3-none-any.whl
Collecting traceback (from -r requirements.txt (line 4))
  Could not find a version that satisfies the requirement traceback (from -r requirements.txt (line 4)) (from versions: )
No matching distribution found for traceback (from -r requirements.txt (line 4))

Importing issue

I have a 861 MB database backup. I tried importing it multiple times on a clean database. I've left it for 4+ hours and overnight one time. I'm running the tool with 8 threads. 7 of the threads don't show any sings of activity and only one of them is active. Memory goes up to around 8GB and CPU to over 100% sometimes for only that one thread. The other threads stay idle. At some point network traffic stops and everything just hangs there. There is activity in memory and cpu, but no packets sent.

Invalid syntax

Getting File "import.py", line 60 except Exception, e: ^ everytime I try to run the import.py

Stuck on started at

I'm testing this out with a json file that's only 1.7 mb right now and It's been on 'start at' after running the script for about 10 minutes now. Is this normal?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.