I'm working on configuring newsplease to use MySQL for persistent storage. I'm running

Re-enable MySQL about news-please HOT 6 CLOSED

fhamborg commented on August 15, 2024

Re-enable MySQL

from news-please.

Comments (6)

fhamborg commented on August 15, 2024

Currently, news-please does not support MySQL export out of the box, that's why it's also not mentioned on the main page of the repository. We had that in one of the earlier versions but to lower maintenance costs of the project and since we didn't and still don't need MySQL decided to drop this feature. Instead, we added Elasticsearch support.

However, I imagine that except for some attribute renaming, e.g., the back-then called html_titleis called simply titlein the latest stable version, MySQL support should run out of the box.

You can find the init-db.sql script here: https://github.com/fhamborg/news-please/blob/15abe7cfeb08b4a78e580f96215a9f651927a900/init-db.sql

Let me know if you run into any problems. In case they are minor I might be able to help you out. I would greatly appreciate if you get it running and create a pull request for that, so that the whole community can benefit from your efforts.

from news-please.

dustyny commented on August 15, 2024

I might be confusing two separate things. I'm trying to use MySQL to store the crawl history so I can cluster the newplease crawlers. I edited the configuration but it didn't connect. I though it might be because I didn't create a newsplease table & schema in MySQL.

I agree with your decision 100%, I think ElasticSearch is a far better export source for a crawler then MySQL.

from news-please.

fhamborg commented on August 15, 2024

Okay, so I'm closing this. Regarding the cluster of news-please instance, I'd recommend to separate them by domain, does that not work?

from news-please.

dustyny commented on August 15, 2024

I apologize. I have over complicated a very simple question.

I would like to have newsplease save the crawl history to a MySQL database. I have updated the config.cfg with the MySQL username & password. But it does not seem to be working. When I run newsplease it does not save any data to MySQL.

[MySQL]

# MySQL-Connection required for saving meta-informations
host = localhost
port = 3306
db = 'news-please'
username = 'crawler_root'
password = 'XXXXXXXXXXXX`

Are there steps that are missing from the documentation?
Do I need to create the newsplease database & schema or does the newsplease app do that automatically?

from news-please.

fhamborg commented on August 15, 2024

Ah, now I got it :) The main reason, why what you're planning to do does not work is that MySQL support was originally added to export crawled & extracted articles to MySQL (instead of json files or Elasticsearch). MySQL export was never supposed to be a crawling history, even though one could of course use it as such. The other reason is that the support of MySQL export was dropped quite some time ago.

Anyways, what I wrote in my previous message still holds, so it should be easy for you to enable MySQL support. Simply init your MySQL using the script that I linked above. Afterwards, you need to add the MySQL pipeline task to your pipeline in the config file, the exact name can be found here, it is MySQLStorage. If you look at the history of commits to this repository, particularly the very first commits, you should find the full functionality of MySQL export (if anything else than what I mentioned earlier is required).

from news-please.

dustyny commented on August 15, 2024

I see where I was confused. I only have another week or two to work on my project. So I don't have time sort out MySQL now. I will take a look though and start to understand the code. :)

My backup plan is to connect the Docker host to a NFS share and then mount each container to it's own folder. So when newsplease starts up it will read it's configuration from a subfolder on the NFS share and then will write it's crawl history in to sub-directory.

Do you think that would be a good workaround for now?

from news-please.

Re-enable MySQL about news-please HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent