metagene

API Reference

Complete reference for all functions and classes in the Metagene package.

Table of contents

  1. Data I/O Functions
    1. load_sites()
    2. load_reference()
    3. load_gtf()
  2. Analysis Functions
    1. map_to_transcripts()
    2. normalize_positions()
    3. show_summary_stats()
  3. Plotting Functions
    1. plot_profile()
    2. plot_profile()
    3. plot_binned_statistics()
  4. Utility Functions
    1. annotate_with_features()
    2. calculate_bin_statistics()
  5. Command Line Interface
    1. Main Command
    2. Options
    3. Examples
  6. Data Structures
    1. PyRanges Objects
  7. Error Handling
  8. Type Hints

Data I/O Functions

load_sites()

Load genomic sites from various file formats.

metagene.load_sites(
    input_file_name: str,
    with_header: bool = False,
    meta_col_index: List[int] = None
) -> PyRanges

Parameters:

Returns:

Example:

# Load BED file without header
sites = metagene.load_sites("sites.bed", meta_col_index=[0, 1, 2, 5])

# Load TSV with header
sites = metagene.load_sites("sites.tsv", with_header=True, meta_col_index=[0, 1, 3])

load_reference()

Load built-in reference annotations or list available references.

metagene.load_reference(species: Optional[str] = None) -> Union[PyRanges, dict]

Parameters:

Returns:

Example:

# List available references
available = metagene.load_reference()

# Load specific reference
reference = metagene.load_reference("GRCh38")

load_gtf()

Load custom GTF/GFF file.

metagene.load_gtf(gtf_file: str) -> PyRanges

Parameters:

Returns:


Analysis Functions

map_to_transcripts()

Map genomic sites to transcript coordinates.

metagene.map_to_transcripts(
    input_sites: pr.PyRanges,
    exon_ref: pr.PyRanges
) -> pl.DataFrame

Parameters:

Returns:

Example:

# Load data
sites = metagene.load_sites("sites.tsv", with_header=True, meta_col_index=[0, 1, 2])
reference = metagene.load_reference("GRCh38")

# Map to transcripts  
annotated = metagene.map_to_transcripts(sites, reference)

normalize_positions()

Normalize transcript positions to relative feature positions (0-1 scale).

metagene.normalize_positions(
    annotated_sites: pl.DataFrame,
    split_strategy: str = "median",
    bin_number: int = 100,
    weight_col_index: list[int] | None = None
) -> tuple[pl.DataFrame, dict, tuple]

Parameters:

Returns:

Example:

gene_bins, gene_stats, gene_splits = metagene.normalize_positions(
    annotated_sites, 
    split_strategy="median", 
    bin_number=100
)
print(f"5'UTR: {gene_splits[0]:.3f}, CDS: {gene_splits[1]:.3f}, 3'UTR: {gene_splits[2]:.3f}")

show_summary_stats()

Display summary statistics for the analysis.

metagene.show_summary_stats(data: PyRanges) -> None

Parameters:


Plotting Functions

plot_profile()

Generate a metagene profile plot.

metagene.plot_profile(
    gene_bins: pl.DataFrame,
    gene_splits: tuple[float, float, float],
    output_path: str,
    figsize: tuple[int, int] = (10, 5)
) -> None

Parameters:

Example:

gene_bins, gene_stats, gene_splits = metagene.normalize_positions(annotated_sites)
metagene.plot_profile(gene_bins, gene_splits, "metagene_plot.png")

plot_profile()

Create detailed metagene profile with customization options.

metagene.plot_profile(
    data: PyRanges,
    output_path: str,
    **kwargs
) -> None

plot_binned_statistics()

Generate binned statistics plot.

metagene.plot_binned_statistics(
    data: PyRanges,
    output_path: str,
    **kwargs
) -> None

Utility Functions

annotate_with_features()

Annotate sites with overlapping genomic features.

metagene.annotate_with_features(
    sites: PyRanges,
    features: PyRanges
) -> PyRanges

calculate_bin_statistics()

Calculate statistics for binned data.

metagene.calculate_bin_statistics(
    data: PyRanges,
    bins: int = 100
) -> PyRanges

Command Line Interface

Main Command

metagene [OPTIONS]

Options

Option Type Description
-i, --input PATH Input file path
-o, --output PATH Output file path
-r, --reference TEXT Built-in reference (e.g., GRCh38)
-g, --gtf PATH Custom GTF file
-p, --output-figure PATH Output plot file
--region CHOICE Region to analyze (all/5utr/cds/3utr)
--bins INTEGER Number of bins (default: 100)
--with-header FLAG Input file has header
-m, --meta-columns TEXT Column indices for coordinates
--list FLAG List available references
--download TEXT Download reference

Examples

# Basic analysis
metagene -i sites.bed -o results.tsv -r GRCh38

# With custom parameters
metagene -i sites.tsv --with-header -m "1,2,3" -r GRCm39 --bins 200

# List and download references
metagene --list
metagene --download GRCh38

Data Structures

PyRanges Objects

The package uses PyRanges objects to represent genomic intervals and annotations. Key columns include:

Additional columns may be present depending on the analysis step:


Error Handling

The package provides informative error messages for common issues:


Type Hints

All functions include comprehensive type hints for better IDE support and code clarity. Import types as needed:

from typing import List, Optional, Union
import pyranges as pr