Manus Skill

Bibliometric
Analysis

A reusable skill that transforms any BibTeX file into 12 professional charts, summary statistics, and a publication-ready report. The Python equivalent of R's bibliometrix package.

01

Sample Corpus at a Glance

68 peer-reviewed articles on AI in Auditing (2020–2025)

Documents

0

Sources

0

Authors

0

Co-Authors/Doc

0

Keywords

0

With DOI

0

02

How It Works

Four-step pipeline from raw BibTeX to publication-ready visualizations

1

Prepare Input

Export a .bib file from Zotero, Mendeley, Scopus, or Web of Science.

2

Run Analysis

Execute the script with optional year filters and custom domain keywords.

3

Review Outputs

Inspect 12 charts, JSON statistics, and CSV/XLSX data exports.

4

Write Report

Use the Markdown template to build a narrative around the generated visualizations.

Workflow diagram
03

Quick Start

Three commands to go from .bib to charts

Install dependencies (once)
sudo pip3 install bibtexparser wordcloud networkx openpyxl
Run the analysis
python /home/ubuntu/skills/bibliometric-analysis/scripts/run_analysis.py \
    my_references.bib \
    ./output \
    --year-min 2020 --year-max 2025
Convert report to Word
pandoc output/report.md -o output/report.docx

Command-Line Options

ParameterRequiredDescription
bib_fileYesPath to the .bib (BibTeX) file
output_dirYesDirectory for charts and data output
--year-minNoMinimum publication year to include
--year-maxNoMaximum publication year to include
--domain-keywordsNoText file with custom keywords (one per line)
04

Sample Output: 12 Charts

Generated from 68 articles on AI in Auditing (2020–2025)

#01
Annual Scientific Production

Annual Scientific Production

Bar chart with cumulative line showing publication growth over time.

#02
Most Relevant Sources

Most Relevant Sources

Top 15 journals and proceedings ranked by article count.

#03
Most Productive Authors

Most Productive Authors

Top 15 authors by number of publications in the corpus.

#04
Authors Over Time

Authors Over Time

Bubble chart showing top authors' production across years.

#05
Most Frequent Keywords

Most Frequent Keywords

Top 20 domain keywords extracted from titles and abstracts.

#06
Keyword Co-occurrence Network

Keyword Co-occurrence Network

Network graph of keyword relationships and co-occurrences.

#07
Word Cloud

Word Cloud

Visual representation of keyword frequency and prominence.

#08
Collaboration Patterns

Collaboration Patterns

Distribution of single vs. multi-authored publications.

#09
Co-authorship Network

Co-authorship Network

Network of collaboration between the top 20 authors.

#10
Thematic Evolution

Thematic Evolution

How research themes shifted across time periods.

#11
Lotka's Law

Lotka's Law

Author productivity distribution following Lotka's inverse square law.

#12
Three-Field Plot

Three-Field Plot

Sources × Keywords × Authors relationship visualization.

05

Output Files

Everything the script generates in a single run

chart_01 – chart_12.png

12 publication-quality PNG charts at 150 DPI

bibliometric_stats.json

Summary statistics as structured JSON

bibliometric_data.csv

Cleaned article-level data in CSV format

bibliometric_data.xlsx

Same data exported as Excel workbook

report_template.md

Markdown template with placeholders for commentary

customization.md

Reference guide for tuning keywords, colors, and chart parameters

06

Customization

Adapt the analysis to any research domain

Custom Domain Keywords

Create a plain-text file with one keyword per line to override the default AI + Auditing vocabulary:

keywords.txt
artificial intelligence
machine learning
audit
auditing
accounting
ethics
professional judgment
formation

Then pass it with --domain-keywords keywords.txt

Key Design Decisions

Keywords extracted from titles, abstracts, and author keywords using a domain-specific dictionary

Journal names cleaned of LaTeX artifacts ({}, \&, \textbackslash)

Conference proceedings from booktitle fields included with (Proceedings) suffix

Thematic evolution automatically splits corpus into 3 equal time periods

All charts use 150 DPI, white backgrounds, and Material Design color palette