Comments (3)
from baleen.
To fix this, I wrote a white list of Post IDs:
Wrote 566142 Posts IDs in 644.759 seconds
Post.objects.count() == 566142
from baleen.
Wrote a script called blaze.py
- which goes through and attempts to find any bad decoding errors in posts:
100%|███████████████████████████████████████████████████████████████████████████████| 566142/566142 [02:09<00:00, 4365.45id/s]
Phase One: wrote 566142 Posts IDs in 2 minutes 9 seconds
100%|█████████████████████████████████████████████████████████████████████████████| 566142/566142 [37:09<00:00, 267.01posts/s]
Phase Two: wrote 2 Post errors in 37 minutes 9 seconds
It only came up with 2 errors:
"571a333ac1808103a0d6067c",'utf-8' codec can't decode byte 0xed in position 48824: invalid continuation byte
"57726c2ac1808103a5ed63d6",'utf-8' codec can't decode byte 0xed in position 21004: invalid continuation byte
from baleen.
Related Issues (20)
- document exporter commandline options
- Write tests for exporter sanitization functions HOT 3
- Write tests to make clear which Feed attributes could be changed
- Use Timeout Decorator
- Formalize Mongo Schema
- README Markdown messed up HOT 1
- Baleen add2venv HOT 8
- Examples for documentation HOT 1
- Configurable Scheduling
- Update baleen github repo url in docs HOT 4
- PEP8 cleanup HOT 5
- Change post object in order to avoid duplicate fetch
- Move html sanitization to Post HOT 2
- move sanitize to its own exporter option HOT 2
- Add load from csv HOT 3
- Export Compressed Posts HOT 1
- Export to directory other than '.' fails HOT 1
- xmlPaths in .opml feed definition files are unescaped HOT 1
- conect with mongodb HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from baleen.