MetaX Cookbook

This is the guidebook for the MetaX GUI Version. If you are using the CLI to analyze, We recommend that you read the documentation for each MetaX module for instructions on how to use it from the command line.

Overview

MetaX is a novel tool for linking peptide sequences with taxonomic and functional information in Metaproteomics. We introduce the Operational Taxon-Function (OTF) concept to explore microbial roles and interactions ("who is doing what and how") within ecosystems.

MetaX also features statistical modules and plotting tools for analyzing peptides, taxa, functions, proteins, and taxon-function contributions across groups.

abstract

Project Page

Visit GitHub to get more information:

https://github.com/byemaxx/MetaX

Getting Started

main_window

tools_menu


Exploring Data with MetaX

See the Preparing Your Data section to build the database and annotate peptides to OTFs before starting.

Module 1. OTF Analyzer

After obtaining the Operational Taxa-Functions (OTF) Table using the Peptide Annotator, you can perform downstream analysis with the OTF Analyzer.

1. Data Preparation

OTFs (Operational Taxa-Functions) Table: Obtained from the Peptide Annotator module.

Meta Table: The first column is sample names, and the other columns represent different groups. If no meta table is provided, meta info will be generated automatically: (1) all samples are in the same group; (2) each sample is a separate group.

Example Meta Table:

samples Individuals Treatment Sweetener
sample_1 V1 Treatment XYL
sample_2 V1 Treatment XYL
sample_3 V1 Treatment XYL
sample_4 V1 Control PBS
sample_5 V1 Control PBS
sample_6 V1 Control PBS

You can load example data by clicking the button.

load_example

Then, click Go to start the analysis.

2. Data Overview

The Data Overview provides basic information about your data, such as the number of taxa, functions, and proportions.

data_overview

data_overview_func

data_overview_filter

3. Set TaxaFunc

set_multi_table

Data Selection

FUNC_prop

Sum Proteins Intensity

Click Create Proteins Intensity Table to sum peptides to proteins if the Protein column is in the original table.

Data preprocessing

There are several methods for detecting and handling outliers.

In all methods, You can choose detection outliers by a meta column, and a meta to handle the outliers.

You can choose the outliers Imputation by each group or by all samples.

If you use [Z-Score, Mean centring and Pareto Scaling] data normalization, the data will be given a minimum offset again to avoid negative values.

Then, click Go to create a TaxaFunc object for analysis.

TaxaFunc_ready

Then we can check tables in Table Review part, and export it.

table_review

table_review_open_window

4. Basic Stats

PCA, Correlation and Box Plot

basic_stats_pca

We can select meta groups or samples (default all) to plot PCA, Correlation, and Box Plot for [Taxa, Function, Taxa-Func, Peptide table, Protein table]

pca

pca_3d

correlation

boxplot

basic_number

Heatmap and Bar Plot

add_to_list

add_top_list

add_a_list

heatmap_original

basic_stats_bar

basic_stats_bar_setting

Peptide Query

peptide_query

5. Cross Test

T-TEST

t_test

ANOVA-TEST

anova_test

Significant Taxa-Func

Plot Corss Heatmap

t_test_res

corss_heatmap_setting

corss_heatmap

t_test_heatmap

Group-Control TEST

Set a Group as "Control", then compare all groups to Control

Bingo! You noticed the hidden function of MetaX, click Help -> About -> Like 3 times to unlock the function to compare all groups to control.

DESeq2

(Ultra-Up(Down): |log2FC| > Max log2FC)

TUKEY_TEST

tukey_test

taxa_func_linked_only

tukey_plot

6. Expression Analysis

Co-Expression Networks & Heatmap

image-20230728142905839

image-20230728143058568

co_network_pic

image-20230728152236517

image-20230728150853953

bar_switch_satck

bar_to_line

Taxa-Func Network

taxa_func_network

8. Restore Last TaxaFunc Object

Preparing Your Data

Module 2. Database Builder

Note: The results from MetaLab v2.3 MaxQuant workflow do not require database building. However, we do not recommend using these results as input to MetaX, as many peptides may be discarded.

Option 1: Build Database Using MGnify Data

Ensure you download the correct database type corresponding to your data.

dbbuilder

Option 2: Build Database Using Own Data

  1. Annotation Table: A TSV table (tab-separated), with the first column as protein name joined with Genome by "_", e.g., "Genome1_protein1", and other columns containing annotation information.

dbbuilder_own

  1. Taxa Table: A TSV table (tab-separated), with the first column as Genome name, e.g., "Genome1", and the second column as taxa.

Example Annotation Table:

Query Preferred_name EC KEGG_ko
MGYG000000001_00696 mfd - ko:K03723
MGYG000000001_02838 hxlR - -
MGYG000000001_01674 ispG 1.17.7.1,1.17.7.3 ko:K03526
MGYG000000001_02710 glsA 3.5.1.2 ko:K01425
MGYG000000001_01356 mutS2 - ko:K07456
MGYG000000001_02630 - - -
MGYG000000001_02418 ackA 2.7.2.1 ko:K00925
MGYG000000001_00728 atpA 3.6.3.14 ko:K02111
MGYG000000001_00695 pth 3.1.1.29 ko:K01056
MGYG000000001_02907 - - ko:K03086
MGYG000000001_02592 rplC - ko:K02906
MGYG000000001_00137 - - ko:K03480,ko:K03488

Example Taxa Table:

Genome Lineage
MGYG000000001 d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Peptostreptococcales;f_Peptostreptococcaceae;g_GCA-900066495;s_GCA-900066495 sp902362365
MGYG000000002 d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Lachnospirales;f_Lachnospiraceae;g_Blautia_A;s_Blautia_A faecis
MGYG000000003 d_Bacteria;p_Bacteroidota;c_Bacteroidia;o_Bacteroidales;f_Rikenellaceae;g_Alistipes;s_Alistipes shahii
MGYG000000004 d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Oscillospirales;f_Ruminococcaceae;g_Anaerotruncus;s_Anaerotruncus colihominis
MGYG000000005 d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Peptostreptococcales;f_Peptostreptococcaceae;g_Terrisporobacter;s_Terrisporobacter glycolicus_A
MGYG000000006 d_Bacteria;p_Firmicutes;c_Bacilli;o_Staphylococcales;f_Staphylococcaceae;g_Staphylococcus;s_Staphylococcus xylosus
MGYG000000007 d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Lactobacillus;s_Lactobacillus intestinalis
MGYG000000008 d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Lactobacillus;s_Lactobacillus johnsonii
MGYG000000009 d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Ligilactobacillus;s_Ligilactobacillus murinus

Module 3. Database Updater

The Database Updater allows updating the database built by the Database Builder or adding more annotations. This step is optional.

db_updater

Option 1: Built-in Mode

We recommend some extended databases, such as dbCAN_seq.

Option 2: TSV Table

Extend the database by adding a new database to the database table. Ensure the column separator is a tab and the first column is the Protein name, with other columns containing function annotations.

Example:

Protein ID COG KEGG ...
MGYG000000001_02630 Function 1 Function 1 ...
MGYG000000001_01475 Function 2 Function 1 ...
MGYG000000001_01539 Function 3 Function 1 ...

Module 4. Peptide Annotator

1. Results from MAG Workflow

The peptide results use Metagenome-assembled genomes (MAGs) as the reference database for protein searches, e.g., MetaLab-MAG, MetaLab-DIA and other workflows wich using MAG databases like MGnify or customized MAGs Database.

peptide2taxafunc

Required:

2. Results from MaxQuant Workflow

The peptide results from MetaLab 2.3 MaxQuant workflow.

peptide2taxafunc_tab2_1

peptide2taxafunc_tab2_2


Developer Tools

Enjoy MetaX

If you have any issues or suggestions, please New issue in my GitHub.