Git Product home page Git Product logo

dergipark-project's Introduction

DergiPark Project

Description

DergiPark is one of the biggest websites that provides electronic hosting for academic peer-reviewed articles in Turkey. In this project, I extracted all articles from DergiPark and parsed the data in 8 main headings. Afterwards, I outputted that data into different formats of files like .jsonl (JSON lines) and .txt (Text). Complete Data-Set is available in the DergiPark-Data-Set repository. The number of formats can be increased by customizing the source code.

  DergiPark currently has over 25.000 academic articles. I extracted them all through Web Scraping with Python. Web Scraping is basically extracting a big amount of data from a specific website by reaching its source codes and parsing the tags.

  The data that I extracted can be used in Ai models to give meaning to this data or train any model with them. Because the data is academic peer-reviewed articles this data can be used in any formal project.


Used Techs

I used Python as a main programming language.

  For Web scraping, I used 'BeautifulSoup' and 'request' modules. Except for these I used 'json', 'os' and 'time' for outputting the data and waiting sections.


Installation

1) Download


Download the project as an executable file from Releases and run the DergiPark.exe file.


2) Clone


Clone the project

git clone https://github.com/Alperencode/DergiPark-Project

Go to the project directory

cd DergiPark-Project

Install the required modules

pip install -r requirements.txt

Run the python file

python main.py

Usage/Examples

Run main.py in root directory

python main.py

Example of proper working


Screenshots

Screenshot from JSON line data


Screenshot from txt data


Related


Authors

dergipark-project's People

Contributors

alperencode avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

dergipark-project's Issues

Implement threading

Implementing threading will significantly increase the performance of the program. I'll use concurrent.futures module to implement threading.

Update README

Update the README screenshots for the latest update so recently added print messages can be seen.

Implementing Tkinter

This project can be expanded by using Tkinter GUI and can be more visual. After Graphical User Interface is implemented, the program can be turned into an executable (.exe) program to allow everyone to use it without having python or any module on their computer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.