Git Product home page Git Product logo

Comments (6)

n3mo avatar n3mo commented on August 30, 2024

I am unable to replicate your result using jsan. Could you please supply a minimal command example of what you're doing to generate this inconsistency. On my end, for tasks such as twitter-search twitter-stream and twitter-user, I can convert the resulting JSON file to csv with

cat mydata.json | jsan --keep id id_str user:id user:id_str

or equivalently,

jsan --input mydata.json --keep id id_str user:id user:id_str

The resulting csv columns for id, id_str, user:id, and user:id_str all have the same data as present in the raw JSON file (i.e., no truncation)

If you can supply a working example of your problem I'll try to help diagnose what's going wrong.

Best,
Nick

from massmine.

n3mo avatar n3mo commented on August 30, 2024

Is it possible that your spreadsheet program is truncating the view of the offending columns, given that the id numbers tend to be large numbers? If you expand the column view, perhaps the whole id number is present but only obscured by the margins of the column cells?

from massmine.

macloo avatar macloo commented on August 30, 2024

It's doing a whack decimal-place thing:
screen shot 2015-09-10 at 12 49 37 pm

I created this CSV with this command (working w/ Aaron):

jsan --input=love.json --output=love.csv

The command is straight out of the documentation.

from massmine.

n3mo avatar n3mo commented on August 30, 2024

This is the expected behavior of MS Excel. For very large numbers, Excel displays in scientific E notation to save space on screen. You can replicate this manually by opening a new Excel spreadsheet. Click on a cell and type a long number (e.g., 2376824568347563845) and then press enter. Excel will display that number as 2.37683E+18 (or similar). This is the same number represented in short form.

Importantly, this reflects the display behavior of Excel, and does not change the number at all. If you click the cell, the full number will display in the function(fx) field at the top of the spreadsheet.

Also, this is not a result of massmine or jsan. You can confirm this by opening your csv file with a text editor. If you visually inspect the csv file, you'll find that the id values inside will be the full number NOT in scientific notation.

I hope this helps!
Nick

from massmine.

macloo avatar macloo commented on August 30, 2024

When I click the cell, here's what displays in fx field:

641623000000000000

It is not Excel. Look at the CSV MassMine gave me:

screen shot 2015-09-10 at 4 15 45 pm

And here is the JSON, same record:

screen shot 2015-09-10 at 4 17 05 pm

I would have to go into the JSON file to find the tweet ID.

from massmine.

n3mo avatar n3mo commented on August 30, 2024

Here is a working example that I just ran on Research Computing (I assume you're using RC for the data collection and conversion steps... please correct me if I am mistaken):

module load massmine
massmine -t twitter-stream -q love -c 10 -o love.json
jsan --input love.json --output love.csv

Providing you do not first open the love.csv file in Excel (or Mac Numbers, or whatever else), and you open it directly with a text editor, the resulting id and id_str data in the love.csv file are identical to the original love.json file:

Original love.json:
json_screenshot

Converted love.csv:
csv_screenshot

For me, running the above code works on both Research Computing and my personal laptop.

What often happens with Excel, Numbers, and the like, is that after you open them they change the underlying file (such as turning large numbers into scientific notation, and by turning dates into date strings preferred by the spreadsheet software). This is why, annoyingly so, after you open a .csv file with one of the programs they ask you if you want to save your file when closing it even if you changed nothing in the file. This routinely happens to my students in my methods course when they open their data just to view it with a spreadsheet.

Can you please create a new csv file using jsan, and open it with a text editor without opening it first with Excel/Numbers/etc.? You can start with your original love.json file and run it through jsan again. Then, if the id numbers are the same, open and save the .csv file in Numbers/Excel/etc.. Reopening the .csv in a text editor should reveal where/when the conversion is happening.

Thanks,
Nick

from massmine.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.