SugarPy Results

class sugarpy.results.Results(scan_rt_lookup={}, validated_results=None, monosaccharides={})

SugarPy results are stored as a SugarPy results class in the Python pickle format. Employing functions from the results class allows to e.g.:

  • write results as CSV files (write_results2csv)
  • plot elution profiles (plot_glycan_elution_profile)
  • plot annotated spectra (plot_annotated_spectra)

The SugarPy results class itself is a dictionary that contains all scored_glycans as well as the spec_collector for each peptide: dict = {

peptide_unimod : {
‘scored_glycans’: {
spec1 : {
glycan_tuple1 : {
‘tree_length’: int ‘SugarPy_score’: float, ‘num_subtrees’: int, ‘suc0r’: set, ‘formula’: str, ‘subtrees’: list,

}, glycan_tuple1 : …

}, spec2 : …

}, ‘spec_collector’: {

spec1 : {
formula1 : {
‘vector’ : list, ‘charge’ : list, ‘trivial_name’ : list, ‘glycan_comp’ : list, ‘glycan_trees’ : list,

}, formula2 : …

}, spec2: …

}

}

}

add_results(peptide_unimod=None, spec_collector=None, scored_glycans=None)

Adds results to the SugarPy results class

Keyword Arguments:

peptide_unimod (str): peptide#unimod spec_collector (dict):

dictionary returned by run.sort_results()
scored_glycans (dict):
dictionary generated by run.validate_results()
calc_Y_ions(glycan_combinations=None, peptide_unimod=None, charge=2, end_monosacch='HexNAx', internal_precision=None)
Returns:
dict: { transformed mz: [name1, name2, …] }
calc_and_match_frag_ions(glycan_list=[], peptide_unimod=None, spec_id_list=[], pymzml_run=None, internal_precision=None)
Returns:
dict: {spec_id: {‘oxonium_ions’ : [], ‘Y_ions’ : [],}}
calc_oxonium_ions(glycan_combinations=None, internal_precision=None)
Returns:
dict: { transformed mz for z=1 : [name1, name2, …] }
check_peak_presence(mzml_file=None, sp_result_file=None, ms_level=1, output_file='', pyqms_params=None, rt_border_tolerance=None, min_spec_number=1, charges=[1, 2, 3, 4, 5])

Takes a SugarPy result file as well as an mzML file to check in the mzML file for the presence of peaks corresponding to identified glycopeptides. If any are found, it is also checked if they were fragmented at some point of the run.

extract_best_matches(sp_result_file=None, output_file='extracted_results.csv', max_trees_per_spec=1, min_spec_number=1)

Filter a SugarPy results csv file to extract the best matching glycan compositions.

Keyword Arguments:

sp_result_file (str): inpput file path output_file: output file name max_trees_per_spec:

Maximum number of glycan compositions taken into account per spectrum
min_spec_number:
Minimum number of consecutive spectra required for glycan to be accepted
glycan_to_tuple(glycan)

Converts a glycan (unimod style: Hex(2)HexNAc(5)) into a tuple of (monosaccharide, count) pairs

parse_result_file(result_file, return_type='plot', min_spec_number=1)

Parses a SugarPy results .csv file and extracts identified peptides together with their glycans and charges.

Arguments:
result_file (str): Path to the SugarPy result .csv file. return_type (str): ‘plot’ or ‘peak_presence’
Returns:
dict: The dict contains all identified peptidoforms (Peptide#Unimod:Pos),
as keys and a dict with the glycans (keys) and {‘charges’:set(), ‘file_names’:set()} (value) as values
plot_annotated_spectra(mzml_file=None, plot_peak_types=['matched', 'unmatched', 'labels'], remove_subtrees=[], plot_molecule_dict=None, peak_colors={'labels': (0, 0, 200), 'matched': (0, 200, 0), 'raw': (100, 100, 100), 'unmatched': (200, 0, 0)}, ms_level=1, output_folder='', ms_precision='5e-6', plotly_layout=None)

Plot one or multiple spectra (raw data). The following peaks can be added:

  • matched (peaks matched by pyQms) and/or
  • unmatched (unmatched peaks from matched formulas) peaks
  • labels (for monoisotopic peaks)
plot_glycan_elution_profile(peptide_list=None, min_sugarpy_score=0, min_sub_cov=0.0, x_axis_type='retention_time', score_type='top_score', output_file=None, title=None, scan_rt_lookup=None, plotly_layout=None)

Plot elution profile(s) for identified glycopeptide(s)

Keyword Arguments:

peptide_list (list): list of peptide#unimod for which elution profiles should be plotted output_file (str): output file name min_sugarpy_score (float):

minimum SugarPy score (glycan compositions with lower scores are not returned)
min_sub_cov (float):
minimum subtree coverage (glycan compositions with sub_cov are not returned)
x_axis_type (str):
Plot by spectrum_id or retention_time (x-axis)
score_type (dict):
Plot by best score (top_score) or sum of scores (sum_scores) for each spectrum (y-axis)

title (str): Title of plot plotly_layout (dict): plotly layout used for the plot

Returns:
str: output file name
plot_molecule_elution_profile(plot_molecule_dict=None, output_file=None, title=None, include_subtrees='no_subtrees', monosaccharides=None, scan_rt_lookup=None, x_axis_type='retention_time', plotly_layout=None)

Plot elution profile for molecules (chemical compositions). This can be used e.g. to plot separate elution profiles for fragmetn ions of a glycan composition

plot_scatter_vec_id(x_axis_list=None, y_axis_list=None, text_list=None, name_list=None, title=None, x_axis_name=None, y_axis_name=None, output_file=None, plotly_layout=None)

Plots a Scatter plot with given x and y data. Both need to be given as a list of lists, each list representing one trace in the final plot. Values can be annotated using a text_list. Traces can be annotated using a name_list.

sort_glycan_trees(scored_glycan_trees=None, tuple_pos=1, sort_by='SugarPy_score')

Sort glycan composition e.g. by the SugarPy_score.

Returns:
dict
sort_plot_lists(name_list=None, x_axis_list=None, y_axis_list=None, text_list=None)

Sort name_list alphabetically in a new list and append elements from x_axis_list, y_axis_list and text_list to new sorted lists in the right order.

write_results2csv(output_file=None, max_trees_per_spec=5, min_sugarpy_score=0, min_sub_cov=0.0, peptide_lookup=None, monosaccharides=None, scan_rt_lookup=None, mzml_basename=None)

Return a csv file containing a summary of all results stored in the results class

Keyword Arguments:

output_file (str): output file name max_trees_per_spec (int):

maximum number of glycan compositions returned for one spectrum
min_sugarpy_score (float):
minimum SugarPy score (glycan compositions with lower scores are not returned)
min_sub_cov (float):
minimum subtree coverage (glycan compositions with sub_cov are not returned)
peptide_lookup (dict):
dictionary returned by run.parse_ident_file()
monosaccharides (dict):
dictionary containing name and chemical composition of monosaccharides
scan_rt_lookup (dict):
dictionary containing the retention time for each spectrum

mzml_basename (str): name of the mzML or sample

Returns:
str: output file name