Hello! My name is José, but you can also call me Zé or Jubi. Welcome to Portfolio.
Here you will find the links to my project repositories, along with brief descriptions of each project and the key findings or results. If you want more detailed information, you can find it in the repositories or you can send me a message.
Please note that I am still updating my old projects, so you may come across changes from time to time.
If you have any questions or would like to get in touch, feel free to reach out to me via email at [email protected] or connect with me on LinkedIn: https://www.linkedin.com/in/joseferreiradata/.
Thank you for visiting my portfolio, and I hope you find something interesting in the projects!
Email: [email protected]
Linkedin: https://www.linkedin.com/in/joseferreiradata/
Repository: GitHub - Real Estate Crawler
Description: I developed a structured scraper using the Scrapy-Selenium framework to collect real estate data from the VivaReal website, with a focus on the city of Recife. The project faced challenges related to handling dynamic data, as real estate websites frequently update their information. Additionally, I implemented best practices such as including a lag between requests to avoid overloading the servers and ensure compliance with the websites' policies.
Structured data collection of real estate information in Recife. Use of the Scrapy-Selenium framework for interacting with dynamic websites. Implementation of a lag between requests to avoid server overload. Organized data manipulation and storage using pipelines and items in Scrapy. This project showcases my ability to tackle challenges with dynamic data during web scraping and apply best practices, such as including an appropriate lag between requests. The project's structure with pipelines and items in Scrapy ensures the quality and organization of the collected data, enabling future analysis in the real estate market.
In the Real Estate Crawler repository, you can find more details about the implementation, source code, and project configuration.