Projects

Kidney-Disease-Classification-Deep-Learning-Project

Github “Kidney Disease Classification with MLFlow” is a machine learning project focused on the development and deployment of a predictive model for diagnosing kidney diseases. MLFlow, a popular open-source platform for managing machine learning workflows, is utilized to streamline the model development process. This project leverages a dataset of kidney disease-related features to train and evaluate machine learning algorithms, ultimately creating a robust classification model. MLFlow’s capabilities are harnessed for efficient model tracking, experimentation, and deployment, ensuring reproducibility and scalability in the development and deployment of this vital medical tool.

Sensor-Fault-Detection Project

Github In this project, the system in focus is the Air Pressure system (APS) which generates pressurized air that are utilized in various functions in a truck, such as braking and gear changes. The datasets positive class corresponds to component failures for a specific component of the APS system. The negative class corresponds to trucks with failures for components not related to the APS system. The problem is to reduce the cost due to unnecessary repairs. So it is required to minimize the false predictions.

Sentiment analysis using Natural Language processing MLOPs/AIOPs-Project

Github “NLP Classification with MLFlow and DVC” is an innovative project that combines the power of natural language processing (NLP) and machine learning for text classification tasks. MLFlow, a versatile machine learning lifecycle management tool, is used to orchestrate the development and deployment of NLP models. Additionally, Data Version Control (DVC) is integrated into the workflow to efficiently manage and version control the large datasets typically associated with NLP tasks. This project aims to streamline the end-to-end process of training, tracking, and deploying NLP classification models, ensuring reproducibility and scalability while handling complex and evolving text data.

Sales Forecasting using Machine learning

Github This project will be based on Cross-industry standard process for data mining (CRISP-DM). A standard idea about data science project may be linear: data preparation, modeling, evaluation and deployment. However, when we use CRISP-DM methodology a data science project become circle-like form. Even when it ends in Deployment, the project can restart again by Business Understanding. How might it help?This project will be based on Cross-industry standard process for data mining (CRISP-DM). A standard idea about data science project may be linear: data preparation, modeling, evaluation and deployment. However, when we use CRISP-DM methodology a data science project become circle-like form. Even when it ends in Deployment, the project can restart again by Business Understanding. How might it help?

Stock Market Kafka Data Engineering-Project

Github Project Description Executed a comprehensive End-to-End Data Engineering Project involving real-time stock market data using Apache Kafka. Leveraged a stack of diverse technologies including Python, Amazon Web Services (AWS), Apache Kafka, AWS Glue, Athena, and SQL for seamless project execution. Programming Languages: Python Cloud Services: Amazon Web Services (AWS) S3 (Simple Storage Service) Athena Glue Crawler Glue Catalog EC2 (Elastic Compute Cloud) Streaming Platform: Apache Kafka

Data Engineering YouTube Analysis -Project

Github Project Description The project focuses on the efficient management and analysis of YouTube video data, categorized by trends and metrics. Key goals include creating a data ingestion mechanism, transforming raw data through an ETL system, establishing a centralized data repository (Data Lake), ensuring scalability, and leveraging AWS for large-scale data processing. Essential AWS services used include Amazon S3 for storage, AWS IAM for access management, QuickSight for BI insights, AWS Glue for data integration, AWS Lambda for serverless computing, and AWS Athena for S3 queries.

LangChain App-Large Language Modelling Project

Github “The MultiPDF Chat App is a Python-based application designed to facilitate discussions with multiple PDF documents. Users can engage in natural language conversations and pose questions regarding the content of these PDFs. The application leverages a sophisticated language model to furnish precise and contextually appropriate responses to user inquiries. It is essential to recognize that the application’s responsiveness is contingent upon the relevance of the questions posed to the loaded PDFs, as it is specifically tailored to address queries related to these documents”.

Data Scientist

Technical Skills: Python,R,SQL, AWS, Airflow,Kafka

Education

Ph.D., Computer Science Leeds Beckett University (May 2021)
M.S., Telecommunication and internet Technologies University of Applied Science, Technikum, Vienna Austria (December 2016)
B.S., Computer science with Economics Obafemi Awolowo University (May 2012)

Work Experience

Data scientist/Machine learning research Engineer (April 2021 - October 2023, United Kingdom)

Collaborated in the maintenance and development of reproducible data products and tools using statistical programming languages tailored for health data science.
Leveraged innovative approaches to analyze and interpret complex epidemiological data, ensuring meticulous data quality and integrity.
Used database queries to undertake comprehensive analysis, extracting insights from large health datasets and managing the full data lifecycle from extraction to visualization.
Actively engaged in regular brainstorming sessions with colleagues across diverse fields, ensuring alignment with Health Protection Operations’ objectives and promoting reproducible analytical pipelines.
Proactively checked for data anomalies prior to and during analyses, initiating corrective actions and liaising with principal analysts.
Played a vital role in promoting quality assurance procedures and best practices, consistently upholding corporate policies relating to data security and confidentiality.
Improved predictive health data analytics model performance through efficient data analysis and feature engineering techniques.
Performed data extraction, and cleaning for relevant research epidemiological projects from Ukbiobank.
Adopted statistical programming languages for processing and analysing diverse large-scale health data set
Ensured timely attendance of scientific seminars, meetings and training(DNANexus) as appropriate and presenting research findings as appropriate.
Adhered to good data management practices by following GDPR protocols.
Authored articles in layman-friendly language, simplifying complex data insights for a wider audience.
Pioneered the implementation of machine learning models in epidemiological studies, focusing on Mental health disorders identification and enhancing preventive interventions.
Reduced clinical diagnosis time by 30% by optimising machine learning algorithms parameters via python coding
Independently took initiatives in identifying relevant project relating to epidemiology in UKbiobank and making sure the focus of the project is well defined
Improved model performance by 40% through data analysis and feature engineering techniques
Reduced clinical diagnosis time by 30% by optimising machine learning algorithms parameters

Data Science/Artificial intelligence project facilitator(Volunteer) (January 2023 -April 2023, United Kingdom)

Effectively communicate complex technical concepts to diverse stakeholders, including executives, product managers, and principal investigators.
Develop and maintain impactful data visualizations and dashboards to monitor product performance and communicate data-driven insights to stakeholders.
Conduct comprehensive data analysis and provide actionable insights to guide product development decisions.
Proactively manage project timelines and scope to ensure successful and timely project completion.
Used Asana, Kabana and other tools to create project plans, track progress, manage resources, and report status.
Proactively manage project timelines and scope to ensure successful and timely project completion.
Lead the data planning for data-driven evidence-based project to define metrics, KPIs and create full measurement frameworks.

Leeds Beckett University @ Balfour Beatty plc/ Leeds Beckett University (KTP) (November 2020- March 2021, United Kingdom)

Designed and implemented plug-ins into Revit software (Building Information Modeling) using the .NET framework.
Enhanced health hazard status prediction by developing and optimizing machine learning models such as Random Forest and Neural Networks.
Created a comprehensive blueprint for hazard risk identification by using the system development life cycle.
Reduced employee hazard risk by 40% by integrating machine learning model interpretation in identifying risks in building information models.
Built end-to-end machine learning pipelines from data ingestion to model deployment.

Leeds Beckett University @ Leeds Beckett University (September 2017- October 2020, United Kingdom)

Boosted machine learning model performance by implementing feature dimensionality reduction techniques on large bio-datasets.
Increased classification performance by 20% by developing and hybridizing machine learning models for long-term health status predictions.
Achieved over 80% accuracy in most models by conducting predictive analysis on health datasets for obesity management, using classification analysis with machine learning algorithms.

Data science insight analyst @ Sonorys Technology GmbH(January 2015- August 2017, Austria)

Boosted monthly cost revenue prediction by implementing and optimizing machine learning algorithms and data visualization tools to create clear and engaging reports.
Enhanced machine learning model accuracy by 30% using hyperparameter optimization, resulting in better business insights and outcomes for stakeholders.
Led sales-price forecasting project using automated google sheets to streamline the process and increase efficiency.

Talks & Lectures

William Harvey Annual Conference August 2022 Presented an abstract on the interplay of Thyroid function with Post-traumatic disorder using mendelian randomisation.
Future Generation Computer Systems Conference July 2020 Presented seminars on Performance Comparative Analysis Between Machine Learning Models And Dynamic Model In Predicting Short Term Body Weight
Leeds Beckett University, United Kingdom: 2018-2019 Presented seminars on Emerging Technologies Study: 5G and Internet of Things
University of Copenhagen, Denmark: 2018 Presented a seminar on the application of K-means Clustering (Unsupervised Learning) to Weight-loss categorisation
University of Applied Science Technikum, Vienna, Austria: 2016 Presented a seminar on the Internet of Things Using Smart Homes and Cities as a study case. Presented a seminar on Evaluating the Integration of the Internet of Things (Wearable Technologies) with Big Data Analytics (Study case: E-Health)”.

Publications

A Data Analytic and Machine Learning Interpretation approach for Predicting patient attrition in dietary weight loss interventions (2023, in view)
Interpretable Machine Learning Model for Weight-Loss Prediction (2023, in view)
Multi-trait analysis characterizes the genetics of thyroid function and identifies causal associations with clinical implications (2023, in view)
The Role of Thyroid Function in Borderline Personality Disorder and Schizophrenia: A Mendelian Randomization Study (2023, in view)
Marouli, E., Yusuf, L., Kjaergaard, A. D., Omar, R., Kuś, A., Babajide, O., Sterenborg, R., Åsvold, B. O., Burgess, S., Ellervik, C., Teumer, A., Medici, M. & Deloukas, P. (2021) Thyroid Function and the Risk of Alzheimer’s Disease: A Mendelian Randomization Study. Thyroid, 31 (12) December, pp. 1794–1799.
O. Babajide et al.,” A Machine Learning Approach to Short-term Body Weight Prediction in A Dietary Intervention Program. “International Conference on Computational Science, Amsterdam, Netherlands, 2020”.
O. Babajide et al., “Application of Unsupervised Learning in Weight-Loss Categorisation for Weight Management Programs,” 2019 10th International Conference on Dependable Systems, Services and Technologies (DESSERT), Leeds, United Kingdom, 2019, pp. 94-101.