Data Engineer with 4 years of experience in ETL pipelines for Big Data, legacy migration and quality control. Postdoctoral Research Scientist in Artificial Intelligence with 1 year of experience in deep learning models for time series forecasting.
Business areas I have worked for:
- Digital Humanities
- Public and Private Health
- Tourism and Travel
- Marketing (Personal Finance)
- Gas and Oil Industry
Work experience:
- End-to-end development of Python applications for ETL and Machine Learning pipelines.
- Optimization of SQL, Pandas, Polars and PySpark queries (achieving processes up to 30x faster).
- Communication with different teams to translate business rules into accurate database queries.
- Best practices with Parquet and Delta files for Object Storage (S3 AWS and MinIO).
- Development of data visualization with Shiny, Streamlit, Power BI and Databricks.
- Development of data workflows using Databricks.
- Migration of legacy data workflows from Pentaho to Databricks.
- Migration of a Python application from Pandas queries to Polars.
- Development of scalable strategies for detailed data quality checks.
- Data Governance: planning of data lifecycle across different projects.