I’m excited to announce that I have been hired as a Machine Learning Intern for the next three months at Waters Corporation.
My responsibilities involve the development of lightweight Python web apps in the domain of mass spectrometry imaging (MSI) using Streamlit, with deployment through Docker and AWS. I am also responsible for unit and e2e testing of the web apps, documentation and version control with git.
MSI data is multi-dimensional, with 2-3 spatial dimensions as well as a spectral dimension quantifying the presence of 1000s of specific analytes with a given m/z value. Fortunately, my past experience with multi-dimensional preclinical imaging data is already coming in handy!
Right now I’m working on two apps:
MSI Viewer
MSI Viewer provides interactive visualization of 2D MSI analyte data. Contains widgets and controls to view 2D intensity images at specific m/z values, as well as spectral data at a given x-y location, with interactivity enabled through the Bokeh library.
MSI Classifier
MSI Classifier provides an interactive pipeline for unsupervised tissue and analyte classification of 2D MSI data.
For tissue classification, I am using manifold learning to embed x-y points from the ~1000-element spectral dimension into 2-3 dimensions, which allows for efficient hierarchical density-based clustering.
While tissue classification groups similar x-y pixels together based on their mass spectral signature, analyte clustering instead groups similar spectral points together based on their spatial distributions. This allows researchers to discover linked distribution patterns of different analytes, which can aid in pharmaceutical research.
For analyte clustering, I am using Ward’s hierarchical clustering to cluster and label each analyte image based on a user-defined maximum number of clusters.