Comments (6)
I am unable to replicate your result using jsan. Could you please supply a minimal command example of what you're doing to generate this inconsistency. On my end, for tasks such as twitter-search twitter-stream and twitter-user, I can convert the resulting JSON file to csv with
cat mydata.json | jsan --keep id id_str user:id user:id_str
or equivalently,
jsan --input mydata.json --keep id id_str user:id user:id_str
The resulting csv columns for id, id_str, user:id, and user:id_str all have the same data as present in the raw JSON file (i.e., no truncation)
If you can supply a working example of your problem I'll try to help diagnose what's going wrong.
Best,
Nick
from massmine.
Is it possible that your spreadsheet program is truncating the view of the offending columns, given that the id numbers tend to be large numbers? If you expand the column view, perhaps the whole id number is present but only obscured by the margins of the column cells?
from massmine.
It's doing a whack decimal-place thing:
I created this CSV with this command (working w/ Aaron):
jsan --input=love.json --output=love.csv
The command is straight out of the documentation.
from massmine.
This is the expected behavior of MS Excel. For very large numbers, Excel displays in scientific E notation to save space on screen. You can replicate this manually by opening a new Excel spreadsheet. Click on a cell and type a long number (e.g., 2376824568347563845) and then press enter. Excel will display that number as 2.37683E+18 (or similar). This is the same number represented in short form.
Importantly, this reflects the display behavior of Excel, and does not change the number at all. If you click the cell, the full number will display in the function(fx) field at the top of the spreadsheet.
Also, this is not a result of massmine or jsan. You can confirm this by opening your csv file with a text editor. If you visually inspect the csv file, you'll find that the id values inside will be the full number NOT in scientific notation.
I hope this helps!
Nick
from massmine.
When I click the cell, here's what displays in fx field:
641623000000000000
It is not Excel. Look at the CSV MassMine gave me:
And here is the JSON, same record:
I would have to go into the JSON file to find the tweet ID.
from massmine.
Here is a working example that I just ran on Research Computing (I assume you're using RC for the data collection and conversion steps... please correct me if I am mistaken):
module load massmine
massmine -t twitter-stream -q love -c 10 -o love.json
jsan --input love.json --output love.csv
Providing you do not first open the love.csv file in Excel (or Mac Numbers, or whatever else), and you open it directly with a text editor, the resulting id and id_str data in the love.csv file are identical to the original love.json file:
For me, running the above code works on both Research Computing and my personal laptop.
What often happens with Excel, Numbers, and the like, is that after you open them they change the underlying file (such as turning large numbers into scientific notation, and by turning dates into date strings preferred by the spreadsheet software). This is why, annoyingly so, after you open a .csv file with one of the programs they ask you if you want to save your file when closing it even if you changed nothing in the file. This routinely happens to my students in my methods course when they open their data just to view it with a spreadsheet.
Can you please create a new csv file using jsan, and open it with a text editor without opening it first with Excel/Numbers/etc.? You can start with your original love.json file and run it through jsan again. Then, if the id numbers are the same, open and save the .csv file in Numbers/Excel/etc.. Reopening the .csv in a text editor should reveal where/when the conversion is happening.
Thanks,
Nick
from massmine.
Related Issues (15)
- Different languages - Haitian Creole HOT 5
- Building fails on Ubuntu 16.04/Raspberry Pi HOT 7
- JSON from Twitter incorrectly formatted HOT 1
- Twitter-stream --dur is not working HOT 5
- Server Mode HOT 12
- tumblr-posts count is not working HOT 1
- OSX installation HOT 1
- cURL library in Mac OS HOT 3
- cannot open file 'foo': Permission denied HOT 3
- link, and setup on different platforms. HOT 1
- Issues with oauth on fresh MM install HOT 5
- Error 401: Unauthorized HOT 3
- OAuth incorrectly called HOT 2
- Add support for non-English Wikipedia pages HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from massmine.