Data Science Textbook

Website built using Jupyter-book, sphinx and github actions to help with interactive delivery of data science content to my apprenitces in Multiverse. Website is password protected and is only in use for multiverse apprentices.


Drug-Kinase binding project

Predicting drug-kinase binding interactions using SMILES and protein sequences.

  • Created dataset on drug-target interaction using data from synapse.org, ecbi.ac.uk and uniprot.org.
  • Performed different featuriaztion methods on SMILES and protein sequence.
  • Use of different NLP methods to embed protein sequences.
  • Predicting pchembl values using ensemble regression model with rmse of 0.88.


Spotify Song Analysis

Analysing seasonal and economic trends of song attributes based on release date

  • Obtained data on 200,000 songs for last 20 years using Spotify API.
  • Performed hypothesis testing using ANNOVA.
  • Found no statistically significant change in energy or acousticness of songs based on whether they were released in summer or winter seasons.