dc.contributor.advisor | Jaakkola, Tommi S. | |
dc.contributor.author | Wang, Chenyu | |
dc.date.accessioned | 2025-03-27T17:00:14Z | |
dc.date.available | 2025-03-27T17:00:14Z | |
dc.date.issued | 2025-02 | |
dc.date.submitted | 2025-03-04T17:29:05.891Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/158954 | |
dc.description.abstract | High-throughput drug screening – using cell imaging or gene expression measurements as readouts of drug effect – is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. Experiments on drug screening data reveal InfoCORE’s superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | In Copyright - Educational Use Permitted | |
dc.rights | Copyright retained by author(s) | |
dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
dc.title | A Variational Lower Bound to Mitigate Batch Effect in
Molecular Representations | |
dc.type | Thesis | |
dc.description.degree | S.M. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Science in Electrical Engineering and Computer Science | |