This page documents the meaning of each fileds (columns) used in the variant databases available in PubMind.
For the column that has | in it, it means the aggregated value for each record of a variant. The order of different columns should match. For example, the order in column formatted_reference should match with LLM_reasoning and pathogenicity.
PVID: Unique PubMind Variant ID.gene: Name of the gene extracted by LLM.formatted_reference: Citation(s) or source(s) of the variant.MONDO_name_09: Normalized disease name assigned based on 90% similarity with MONDO human disease database.MONDO_ID_09: Corresponding MONDO ID for MONDO_name_09.LLM_reasoning: Reasoning provided by the LLM for pathogenicity classification.pathogenicity: LLM-assigned classification (i.e., pathogenic, likely pathogenic, benign, likely benign, unknown).disease: Disease name extracted from literature paragraph by LLM before normalization (for SNV).related_disease: Disease name extracted from literature paragraph by LLM before normalization (for complex variants).Num_of_record_used: Number of individual records (paragraphs) supporting/mentioning this variant.Num_of_paper_used: Number of individual papers supporting/mentioning this variant.pathogenicity_sum: Final pathogenicity assignment across all records.pathogenicity_score: Quantitative score summarizing the pathogenicity across all records (range: 0–1).confidence: Confidence level for a variant entry based on how many evidences are collected (range: 0-3).confidence_criteria: Reasoning or rules behind the confidence level.RSID: Reference SNP ID from dbSNP.dna_change: cDNA-level mutation (e.g., 123A>T).aa_change: Protein-level mutation (e.g., Glu41Lys).genomic_coord_result: Genomic coordinate(s) inferred from Ensembl transcript mapping. One variant could correspond to multiple transctips.parsed_variants: Parsed structured representation of the variant based on genomic_coord_result.phenotype: Phenotype terms extracted from literature paragraph by LLM before normalization.HPO_term_09: Normalized phenotype term assigned based on 90% similarity with HPO database.HPO_ID_09: Corresponding HPO phenotype ID for HPO_term_09.MONDO_name_counted: Aggregated number of MONDO disease names mentioned across records.HPO_term_counted: Aggregated number of HPO terms mentioned across records.MONDO_ID_counted: Aggregated number of MONDO disease IDs mentioned across records.HPO_ID_counted: Aggregated number of HPO IDs mentioned across records.PMCID_PMID_counted: Aggregated number of references (PMCID/PMID) for the variant.gene_fusion: Formatted fusion (e.g. EML4::ALK).first_gene: First gene of the fusion assigned by LLM, normally this is the driver gene in the fusion.partner_gene: Second gene of the fusion assigned by LLM, normally this is the partner gene.protein_domains_affected: Functional domains impacted, normally for first_gene.type: Structural variant type (deletion, duplication, inversion, etc.).chr_start_end: Genomic coordinates.gene_affected: Genes overlapped/affected by the SV.Chr: Chromosome on which the variant is located.Start: Start position of the variant on the chromosome (1-based coordinate).End: End position of the variant on the chromosome.Ref: Reference allele observed in the reference genome.Alt: Alternate allele identified in the variant record.MANE_transcript_used: If MANE Select transcript was used for the genomic coordinate. (True, False, RSID)Func.ensGene, Gene.ensGene, GeneDetail.ensGene,
ExonicFunc.ensGene, AAChange.ensGene,
CLNALLELEID (clinvar_20240917), CLNDN (clinvar_20240917), CLNDISDB (clinvar_20240917),
CLNREVSTAT (clinvar_20240917), CLNSIG (clinvar_20240917), gnomad41_genome_AF, and gnomad41_genome_AF_grpmax.
Please refer to the ANNOVAR documentation for detailed definitions of each annotation field.
Deleterious: Number of in-silico prediction tools classifying the variant as deleterious based on dbnsfp47a.Tolerated: Number of in-silico prediction tools classifying the variant as tolerated based on dbnsfp47a.Unknown: Number of in-silico prediction tools returning unknown or missing results based on dbnsfp47a.avg_rankscore: Average pathogenicity rank score aggregated from multiple prediction tools (scaled 0–1) based on dbnsfp47a.If you have questions or suggestions, please visit the GitHub repo.