OpenMS Invites the Computational Mass Spectrometry Community to Join Google Summer of Code 2026!

Google Summer of Code (GSoC) 2026 OpenMS is planning to apply as an umbrella organization and we would like to extend an invitation to other projects and groups within the computational mass spectrometry and proteomics/metabolomics communities to join us in this effort.


GSoC Contributors

  • Make sure you are eligible to participate in GSoC 2026.
  • Review the list of themes and the projects available within. If you have specific questions about a project, our mentors are active on Discord and we will happily assist you.
  • Follow our instructions below on how to submit a proposal to us.

Submitting an Application:

  • Proposal must be uploaded to the GSoC webpage before the official deadline. Ensure your CV and contact information are included in the proposal document.
  • We highly recommend to get in touch with the mentors before submitting your proposal.

Available Projects

Theme A) Data Formats and Interoperability

1) imzML Parser in OpenMS

Proposed Mentors: OpenMS Team
Skills: C++, XML Parsing, Mass Spectrometry Imaging
Estimated Project Length: 350 hours | Difficulty: Medium

Mass spectrometry imaging (MSI) is a powerful analytical technique that enables spatial mapping of molecules in biological tissues. The imzML format is the open standard for storing MSI data, consisting of two files: an XML metadata file and a binary data file. While imzML is widely used in the MSI community, OpenMS currently lacks native support for reading and writing this format.

The goal of this project is to implement a robust imzML parser in OpenMS, enabling seamless integration of mass spectrometry imaging data into existing OpenMS workflows.

Tasks:

  1. Implement a C++ imzML reader that can parse both continuous and processed imzML formats.
  2. Implement an imzML writer to enable export of imaging data.
  3. Add support for common imzML metadata (coordinates, pixel size, spectrum parameters).
  4. Integrate the parser with existing OpenMS data structures (MSExperiment, MSSpectrum).
  5. Write comprehensive unit tests and validate against reference imzML datasets.
  6. Document the new functionality and provide usage examples.

2) Full Python Bindings Using Nanobind

Proposed Mentors: OpenMS Team
Skills: Python, C++, Nanobind, Cython
Estimated Project Length: 350 hours | Difficulty: Medium to Advanced

PyOpenMS provides Python bindings for OpenMS, enabling Python developers to access the powerful mass spectrometry algorithms implemented in OpenMS. Currently, these bindings are generated using Cython via the autowrap package. While functional, this approach has limitations in terms of maintenance overhead, compile times, and certain Python integration features.

Nanobind is a modern C++17 library for creating Python bindings with minimal overhead. An existing prototype demonstrates the feasibility of using nanobind for OpenMS Python bindings. This project aims to build upon this prototype to create comprehensive nanobind-based Python bindings for OpenMS.

A key deliverable of this project is a thorough evaluation comparing nanobind-based bindings against the current autowrap/Cython implementation. This evaluation should cover:

  • Performance: Binding overhead, memory usage, call latency
  • Build System: Compile times, dependency management, CI/CD integration
  • Usability: Python API ergonomics, documentation generation, IDE support
  • Maintenance: Code complexity, debugging experience, update workflow
  • Compatibility: Python version support, platform compatibility, NumPy/SciPy integration

Tasks:

  1. Evaluate the existing nanobind prototype and identify gaps in coverage.
  2. Extend the nanobind bindings to cover core OpenMS classes and algorithms.
  3. Implement automatic binding generation where feasible to reduce maintenance burden.
  4. Conduct systematic benchmarking comparing nanobind vs. autowrap/Cython bindings.
  5. Document advantages and disadvantages of each approach with concrete examples.
  6. Write unit tests ensuring feature parity with existing PyOpenMS functionality.
  7. Provide a recommendation report for future binding strategy based on findings.

3) Accelerating OpenSwathWorkflow for Large-Scale In Silico Spectral Libraries

Proposed Mentors: Joshua Charkow Skills: C++, Algorithm Optimization, Profiling Estimated Project Length: 200 hours | Difficulty: Medium

OpenSwathWorkflow is a central component of OpenMS for Data Independent Acquisition (DIA) analysis, enabling targeted extraction and scoring of chromatographic signals using spectral libraries. While OpenSwathWorkflow performs well for conventional experimental libraries, the increasing adoption of large in silico–generated spectral libraries presents substantial computational challenges. Such libraries can contain millions of precursors, leading to increased memory usage, longer runtimes, and scalability bottlenecks in candidate selection and scoring.

This project aims to analyze and improve the computational performance and scalability of OpenSwathWorkflow, with a particular focus on workflows using very large in silico spectral libraries. The goal is to identify bottlenecks, redesign performance-critical components where necessary, and introduce optimizations that enable efficient processing without compromising identification quality.

A key deliverable of this project is a systematic performance evaluation of OpenSwathWorkflow before and after optimization.

Tasks:

  1. Develop a comprehensive understanding for the OpenSwathWorkflow algorithm
  2. Develop a benchmarking dataset for profiling.
  3. Profile OpenSwathWorkflow to identify computational bottlenecks.
  4. Identify algorithmic bottlenecks and propose changes.
  5. Experiment with different algorithms using inspiration from other open source DIA projects.
  6. Validate that the optimized implementation provides comparable results to the original implementation and other DIA software tools.