Git Product home page Git Product logo

Comments (6)

lestcape avatar lestcape commented on June 2, 2024 1

Yes, this library is pretty slow, or i should said, I think that PHP is slow. With the library, you can pass your data line by line or all at the same time. If you pass the data all at the same time, this should be faster than only passing one line at time, but it will consume a lot of RAM. Have you enough RAM in production?

Anyway, you can try to find how to speed up the library, but that should be complex and probably the acceleration will be not enough satisfying to you.

The other alternative you have is to use PSPP. With PSPP there are two variants, that good, they depends on your real conditions.:

1- Generate a CSV file with PHP and also generate a syntax SPSS file from PHP, which reads the CSV file that you just generated and builds the base in SAV, already specifying everything else that is needed, but that is not in the CSV. You can then tell PSPP to execute the syntax. Let me clarify that this is about twice as fast, but not extremely fast.

2- If your database is in PostgreSQL, you are probably in luck. In Debian, they previously compiled PSPP with a plugin that allowed you to connect directly to PostgreSQL and extract data: https://www.gnu.org/software/pspp/manual/html_node/GET-DATA-_002fTYPE_003dPSQL.html

They no longer compile it like this in Debian, due to a copyright problem with an SSL library on which the solution depended, but you can try to compile it yourself with that plugin, if you don't already have it incorporated into your distro. In Debian:

Captura desde 2024-02-14 14-24-39

I don't think you can find a faster option than the ones I propose. The implementation of PSPP save file writings is the most robust and fastest free implementation you will find, but don't really expect miracles either.

from spss.

lestcape avatar lestcape commented on June 2, 2024

So, i recommend you to start the generation of the sav in the server, register an id of the process in a database and add there information about the progress of the process. Then in the client implement a procedure that check periodically for the state of the process using the id, that is in your database. At less is how i do that. I don't use this library for writing, I use PSPP instead.

And well, I also have another library in PHP to write the data in the STATA (https://www.stata.com/) format and another more that write the data in XBase format (https://en.wikipedia.org/wiki/.dbf). Both libraries are more faster than this one.

from spss.

sinnbeck avatar sinnbeck commented on June 2, 2024

Wow thanks for the awesome answers!

It can see it uses around 250 MB of ram (I use generators to keep memory as low as possible), but luckily the system only have 3 users that can export the .sav files, so chances are it wont be a problem.
I have implemented both versions and (single write and writeCase()) and the single write is indeed a bit faster but uses more ram. The good thing about the cases is that I can echo our which case I am currently adding.

I actually have pspp installed but I installed it with flatpak originally. I will try compiling it myself

  1. This sounds like an awesome idea. I has suggested to our users that I could just output both a .sav file without data along with a csv file or an xlsx file. They could then merge them inside SPSS. But if I can do this in code, that would be much much better! I hope I can pipe in the instructions.
  2. Sadly it doesn't and either way, the data needs to formatted and filtered before being saved to .sav

If I can just make it twice as fast then that would be quite an improvement!

Regarding the frontend. I plan to either use websockets/SSE to inform the user when the file is ready (and perhaps the progress if I use writeCase()) or just mail the user a link to the file once it is done :)

Can SPSS read STATA and XBase and do they have all the same features as .sav ?

from spss.

lestcape avatar lestcape commented on June 2, 2024

I has suggested to our users that I could just output both a .sav file without data along with a csv file or an xlsx file.

Yes well, you can give a csv file and a syntax file for convert the csv to sav, but the problem then will be that you don't know the absolute url where the user will save the files. I think SPSS don't work with relative urls unfortunately.

Can SPSS read STATA and XBase and do they have all the same features as .sav ?

XBase is not a database originally intended for statistics analyzed, so it have not the same features of SPSS. STATA is a statistics software that have similar feature than SPSS, that why his database have more less the same features. There are software for convert several statistics database in another, like can be stattransfer: https://stattransfer.com/, but well, SPSS can import easy and directly the data in the STATA and the dBase (XBase) format. But i don't recommended generate a database in a non default format, because what you will do is to add the limitations of each formats. All formats have their own limitations (example: the max size of what you can stored in a label).

I need all that formats, because i have several clients and some want the database in XBase - DBase (dbf), for work with Microsoft Visual FoxPro and another's because they use STATA to analyze the data instead of SPSS. Also CSV, because some use R. Our surveys are the officially health surveys of the government. There are a lot of researchers with different formations and with different interested involved, that used different software also to do the same thing.

Sadly it doesn't and either way, the data needs to formatted and filtered before being saved to .sav

It sounds like the most efficient task to do in a database manager like Postgres. I don't see the point, PSPP execute a Query and get the result. It don't limited you to execute a query that also formats and filters the data.

from spss.

sinnbeck avatar sinnbeck commented on June 2, 2024

Oh I had hoped there would be an "import cases from csv" thing in the menu or something. But they can still download it and put the files next to each other and then merge them using ADD FILES or UPDATE I assume.

I have been playing around with PSPP and I can see I can write instructions in a file and pipe it into PSPP with cat. I will give this a shot and see how fast it is. Just need to to create some test files :)

I just need to extract the data from a survey system, and convert it into a sav file. It is just for my colleagues who all use SPSS. I will stick to writing to a sav file then 👍

It sounds like the most efficient task to do in a database manager like Postgres.
It sounds good in theory but I would need to get all the data into the proper structure and format, insert it into postgres and then ask pspp to extract it again. Sadly the survey system isnt inhouse so I cannot just change databases :(

from spss.

lestcape avatar lestcape commented on June 2, 2024

O well, if your database is not in postgres already, have not sense move it to postgres to just do that.

I can tell you about our case:

We have our own system that is prepared for several things and what we do is connected our System with the survey system that we used. For example, one public system that we used and i can tell about it is the case of CSPro:

CSPro have a CSWeb server side, that store the data in a json format, inside a MySQL/MariaDb database. We have connected the CSWeb MariaDb database and trigger a procedure when CSWeb insert the data in MariaDB. The procedure call a PHP script that extract the current inserted data and reformat it. The reformatted data is send to our survey and reporting system. So, we don't care if the survey is realized in an external survey system, all data are finally send to our own system always, without matter the source. That is why we can used several survey system at same time and do a service in the same way not matter what.

from spss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.