My GSoC 2026 Proposal

HumanAI Foundation: AI-Powered Behavioral Analysis, Crisis Signal Detection, Funding Intelligence, and Team Communication Processing

April 7, 202619 min readBy heramb s.
gsoccompetitionAI

1. Abstract

AI-Powered Behavioral Analysis for Suicide Prevention, Substance Use, and Mental Health Crisis Detection with Longitudinal Geospatial Crisis Trend Analysis

AI4MH aims to monitor aggregated public sentiment across counties to identify early indicators of mental health crises - specifically language patterns associated with depression, suicide ideation, and substance use.

The prototype phase established a BERT-based sentiment classifier trained on aggregated Reddit and Twitter data, with evaluation metrics and a geocoded dashboard in place. This submission builds on that foundation, focusing on the governance and signal validation layer that sits above the model - determining when and how its outputs warrant escalation.


AI-Powered Funding Intelligence (FOA Ingestion + Semantic Tagging)

Funding opportunity announcements (FOAs) live in a messy ecosystem of government portals, PDFs, and dynamic web applications. Research offices still spend hours manually skimming calls, copying deadlines, and guessing whether a grant aligns with their work.

In my previous ISSR cycle with HumanAI, I built an end-to-end prototype that ingests Grants.gov and NSF FOAs, normalizes their structure, and enriches them with semantic tags using local embeddings and LLM reasoning. For GSoC 2026, I want to turn that prototype into a reusable Funding Intelligence service for the HumanAI ecosystem.

The project will:

  • Harden ingestion with multi-source, layout-aware parsing
  • Formalize a typed schema with Pydantic validation
  • Add a semantic matching engine backed by a vector database
  • Ship a minimal but genuinely usable web interface for researchers and admins

Team Communication Processing and Audio Enhancement in Simulated Human-Factors Environments

This project explores improving the quality and usability of team-based communication data within simulated human-factor environments. It identifies and evaluates suitable audio/video datasets, performs structured exploratory assessment, and develops audio enhancement techniques aimed at reducing noise, improving clarity, and preserving conversational nuances without distorting the original signal.


2. About Me and Why I Am the Right Fit

Over this time, I completed four HumanAI projects - each forcing me to take messy, research-grade requirements and turn them into modular pipelines with clear interfaces, typed models, and evaluation hooks.

Team Communication Processing

This project tackled one of the hardest signal-processing problems: extracting clean, usable speech from unscripted meeting audio captured in varied acoustic environments. The AMI Meeting Corpus served as the benchmark.

I implemented a staged acoustic processing pipeline:

  • Spectral noise profiling
  • Butterworth bandpass filtering centered on the human vocal range (300 Hz – 3.4 kHz)
  • RMS gain normalization feeding into downstream Speaker Diarization via Pyannote.audio and transcription via OpenAI Whisper

Beyond SNR, I incorporated PESQ and STOI metrics to give a perceptual quality signal that pure signal-to-noise ratios miss.

Key Engineering Lesson: Decoupled, independently swappable modules are not a nice-to-have - they are what makes it possible to calibrate each stage without cascading regressions across the whole pipeline.


AI4MH - Mental Health Crisis Detection

The AI4MH project required a fundamentally different mindset: instead of optimizing a quality metric, I had to reason carefully about false positives in a public-health context where a false alarm has real institutional cost.

I designed a multi-stage pipeline:

  • A HuggingFace Transformer layer for contextual sentiment extraction (capturing sarcasm and negation that rule-based models miss)
  • A signal construction stage aggregating Sentiment Index (weight 0.40), Volume Spike Score (weight 0.35), and Temporal Acceleration (weight 0.25) across 6-hour county-level windows
  • A Dual-Model Consensus Gate that cross-checks the transformer output against VADER

Most Important Governance Decision: Requiring agreement between models before escalation catches both model hallucinations and input quality issues in a single check.

Additional features included Mahalanobis distance-based OOD detection, DBSCAN-based geo-clustering, and MinHash LSH to strip bot-coordinated posting spikes.


Humanlike AI Systems and Trust Attribution (TrustEval)

TrustEval asked: how does the linguistic persona of an AI model affect how much humans trust and act on its recommendations? I built a production-grade experimental platform in Next.js using a Backend-for-Frontend (BFF) architecture - keeping API keys server-side and preventing subject-side manipulation.

The platform used:

  • A seeded randomizer for balanced cohort assignment
  • A PostgreSQL schema (via Supabase) for centralized data collection at scale
  • Browser Performance API for sub-millisecond latency instrumentation
  • A researcher-facing Admin Dashboard with live Trust Delta visibility

FOA Prototype

The FOA prototype is the most direct evidence of readiness. I implemented:

  • Grants.gov REST API ingestion and NSF scraping
  • Normalized heterogeneous source fields into a unified JSON/CSV schema
  • A three-layer semantic tagging stack: regex rules → local all-MiniLM-L6-v2 embeddings → Gemini-based LLM reasoning

The limitations I found - no table extraction from PDFs, no vector database, fragile single-source scraping, no UI - are exactly what this GSoC proposal addresses.


3. Problem Statement and Community Value

Project 1: Team Communication Processing and Audio Enhancement

Modern collaborative systems increasingly rely on recorded audio and video interactions to understand team dynamics and performance outcomes, yet real-world communication data is often noisy, inconsistent, and difficult to process accurately. This project addresses that gap by establishing a systematic approach to dataset identification, validation, and audio enhancement.

Community value emerges in the form of improved tools for analyzing team interactions across domains such as aviation, healthcare, remote work, and emergency response - ultimately enabling systems that enhance collaboration efficiency, reduce human error, and support better decision-making in high-stakes environments.


Project 2: AI-Driven Crisis Signal Detection and Governance Framework for Mental Health Monitoring

Public health systems increasingly look toward digital signals to identify early indicators of mental health crises, yet distinguishing genuine risk patterns from misleading noise remains a complex, high-stakes challenge.

This project addresses the need for a principled crisis detection layer that integrates multiple indicators into a coherent scoring mechanism, supported by safeguards such as thresholding, smoothing, and human-in-the-loop escalation.

Community value lies in strengthening the ability of public health agencies to respond proactively and responsibly to emerging mental health risks.


Project 3: Intelligent Ingestion & Semantic Discovery for Grant Opportunities

The modern research funding ecosystem is fragmented across numerous platforms, each publishing funding opportunities in inconsistent formats. Traditional keyword-based search fails to capture the nuanced intent and interdisciplinary scope of funding announcements.

This challenge requires a system capable of transforming unstructured and diverse funding data into a structured, semantically enriched, and queryable format, enabling precise discovery and reducing the cognitive and operational burden on researchers and institutions.


Project 4: Scalable Framework for Quantifying AI Trust through Linguistic Persona Variation

As AI systems become increasingly embedded in decision-making, understanding how users perceive and trust these systems has become critical yet underexplored. The primary problem is the absence of a standardized, scalable, and secure framework for conducting controlled behavioral experiments that isolate the impact of linguistic variation on user trust.


4. Background and Related Work

My journey has been centered around building scalable, real-world systems at the intersection of AI, data processing, and product design, with a strong focus on execution over theory.

Currently at Autonmis, I’m building a real-time incident tracking and workflow system that enables cross-team communication, status visibility, and structured execution. This involves developing data-driven frontend systems (dashboards, metrics, filters), implementing Kanban-based workflow tracking, and integrating worker-driven validation pipelines to ensure consistency and reliability. I’m also designing a DAG-based workflow engine to support pipeline-oriented execution and scalable orchestration of incidents and tasks.

Previously, I built Archyve, a decentralized research access platform serving 6,000+ users. My work focused on designing end-to-end data pipelines and building systems that transform fragmented research into structured, accessible knowledge. I also worked on features like a DAO directory and a Chrome extension to improve discoverability and access to research.

At MeTTaPay, I attempted to build an AI-driven transaction pipeline that converts natural language into executable blockchain operations during ETHGlobal 2025 (international hackathon, built solo). This involved designing multi-step reasoning flows, handling ambiguity in user intent, and exploring reliability challenges in automated on-chain execution systems.

Beyond this, I’ve explored systems-level thinking through Nemo OS and built full-stack analytics platforms such as a placement visualizer handling 5,000+ records. I’m strong in full-stack development (TypeScript, Next.js, MongoDB, Redis) and have working experience in Rust, particularly in systems and Web3 contexts.

I'm also leading college club prismlabs, managing a team of developers and organizing events, and I'm a fellow @starknet, @_theresidency, @buildspace, strengthening my ability to design and ship systems under real-world constraints.


5. Goals, Non-Goals, and Success Criteria

Core Goals

  • Design and implement scalable, modular systems that handle heterogeneous data sources - structured documents, unstructured text, behavioral signals, or audio streams
  • Build robust data processing pipelines that include ingestion, validation, transformation, and enrichment
  • Incorporate intelligent reasoning layers combining deterministic logic, statistical methods, and AI-driven techniques
  • Ensure system integrity through secure architectures, emphasizing server-side control and prevention of manipulation
  • Develop researcher-facing interfaces for visualization, querying, filtering, or monitoring
  • Maintain strong focus on reproducibility, documentation, and testing

Non-Goals

  • Not building full end-to-end enterprise platforms or replacing existing large-scale systems
  • Not prioritizing heavy UI/UX complexity over system correctness and functionality
  • Not relying on large-scale model training where simpler, interpretable methods suffice
  • Not automating critical human decision-making in sensitive domains - human-in-the-loop oversight is essential
  • Not creating overly generalized solutions at the cost of depth

Success Criteria

  • Systems demonstrate measurable effectiveness in their core function
  • High reliability and consistency in outputs, with the majority of processed data passing validation checks
  • Clear evidence of scalability without significant performance degradation
  • Reproducible workflows supported by clean code, structured documentation, and well-defined testing strategies
  • Practical usability validated through small-scale testing or pilot usage
  • Codebases that are modular, maintainable, and ready for further extension

6. System Overview and Architecture

Project: AI-Driven Crisis Signal Detection and Governance Framework

1. Sentiment Extraction: The Semantic Layer

The first processing stage involves extracting meaning from raw social media text using a fine-tuned HuggingFace Transformer model (e.g., mental-roberta-base or a domain-adapted variant). This model performs token-level contextual analysis and produces a sentiment intensity score along with a semantic embedding vector.

Why a transformer specifically? Rule-based models fail on negation, sarcasm, and emotionally dense language. "I can't take this anymore" reads differently in isolation versus in a thread about exam stress - transformers capture this with attention over context.

Output at this stage:

  • Sentiment polarity: positive, neutral, negative, crisis-adjacent
  • Intensity score: continuous value in [-1.0, 1.0]
  • Embedding vector: stored for downstream drift detection

2. Signal Construction

A raw sentiment score is not a signal. A signal is a pattern that becomes meaningful when measured across dimensions.

Sentiment Analysis - Posts are bucketed per county per 6-hour window. A moving weighted average uses exponential smoothing [α = 0.3] to reduce sensitivity to single outliers. Only windows with n ≥ 30 posts qualify; below this threshold, the window is flagged as statistically insufficient.

Volume Spike Detection - A baseline rolling average is computed over the prior 14 days for each county. A spike is flagged when current volume exceeds μ + 2σ (two standard deviations above baseline).

Temporal Acceleration - Beyond raw volume, the rate of change matters. If volume increases by 40% over 6 hours vs. gradually over 72 hours, these carry different signal weights. Temporal acceleration is computed as the second derivative of volume over the rolling window.

The three features are combined using a weighted sum:

signal_score = (0.4 × Si) + (0.35 × Vs) + (0.25 × Ta)

Weights are configurable and should be validated against historical labeled events before deployment.

Geographic Clustering Score - Three counties showing elevated negative sentiment simultaneously is more significant than one county alone. A spatial co-occurrence bonus is applied when adjacent or nearby counties breach signal thresholds within the same 6-hour window:

signal_score × 1.25  { if Cadj ≥ 2
                      { otherwise unchanged

Adjacency is defined using a pre-loaded county adjacency graph (FIPS code pairs), computed once from Census boundary data. The multiplier is capped - it can boost a signal over the monitoring threshold but cannot alone push a signal to urgent status.


3. Signal Validation: Consensus & Confidence Estimation

Semantic and Consensus Architecture

A signal derived from a single model is not trustworthy enough to route toward human review. A fast secondary check is run using VADER (Valence Aware Dictionary and Sentiment Reasoner).

The purpose is not to replace the transformer but to act as a consensus gate:

  • If HF Transformer scores a post cluster as highly negative and VADER does as well → signal proceeds
  • If HF Transformer says highly negative and VADER judges it as positive or neutral → high variance flag is raised

When variance between the two models exceeds a defined threshold (absolute score difference > 0.45), the signal is discarded for this cycle.

Why VADER specifically? VADER is specifically designed for short-form social media text. It understands capitalization ("I AM DONE" vs. "i am done"), punctuation emphasis ("no!!!" vs "no"), and common internet intensifiers - making it more appropriate for fast consensus on social media text.

Confidence Score Formula:

confidence = 1 − (|theta_HF| − |theta_VADER|) / 2.0

A perfectly agreeing pair will have confidence = 1.0. Maximum disagreement will have confidence = 0.0. Only signals with confidence ≥ 0.6 advance. This threshold is configurable and logged with every decision.


4. Semantic Drift & Out-of-Distribution (OOD) Detection

Semantic and OOD Detection Architecture

Transformer models degrade - sometimes silently - when input contains heavy slang, emoji substitution, or gibberish from bot accounts or corrupted scrapes. These inputs produce embeddings that cluster far from the model's training distribution.

Density Estimation - A reference distribution of embeddings from known valid training posts are stored. New embeddings are scored using Mahalanobis distance:

DM(x) = √((x − μ)ᵀ Σ⁻¹ (x − μ))

Language Detection Pre-filter - Before the OOD check, langdetect or fasttext language ID runs. Non-English posts (or posts below 60% English token confidence) are excluded from the transformer pipeline entirely.

Entropy Check - The probability distribution of SoftMax is checked. A well-calibrated prediction concentrates probability mass on one class. If entropy is high, the model is uncertain - treat the post as uninformative.


5. Bot & Coordinated Activity Detection

Bot Detection Architecture

Volume spikes are not always the result of a genuine crisis. Coordinated bot activity can produce spikes that mimic community distress. Two layers of detection filter these out:

Geographic Concentration Check - A legitimate crisis signal will be distributed across a county. If a spike appears to originate from a small geographic area or a single subnet range, the signal value is reduced proportionally. This uses the DBSCAN algorithm - posts within a time window are plotted based on user location.

Repetition & Duplication Fingerprinting - Near-duplicate posts are identified via MinHash LSH on n-grams. If more than 30% of posts in a spike window share high similarity (Jaccard similarity > 0.8), template-based or automated posting is indicated and posts are excluded.

Velocity Fingerprinting - A user posting more than 8 times per hour with high content similarity is indicative of a bot and excluded. The combination of high velocity + high content similarity triggers exclusion, not either alone.

Cross-County Injection Detection - If a post cluster appears in multiple counties within a small time window, it is flagged as a coordinated injection. All posts are flagged and their contribution to volume spike calculation is nullified. The event is logged as a "coordinated_amplification_event".


6. Signal Filtering & Bias Adjustment

Bias Correction - Urban counties naturally produce higher post volumes, creating an inherent advantage in signal detection. Signal scores are normalized against county-level baseline engagement rates. A rural county with 40 crisis-adjacent posts in a window may represent a proportionally larger signal.

Signal Categorization - Validated signals are tagged by category based on dominant language patterns:

  • suicide_ideation: direct or indirect references
  • substance_distress: drug/alcohol-related negative sentiment
  • general_crisis: acute negative sentiment without specific categorization

This categorization determines which human reviewer receives the escalation.

Audit Log - Every signal (whether escalated, monitored, or discarded) generates an immutable audit log entry containing: timestamp, county, composite score, model outputs, variance flag status, OOD flag status, bot flag status, and the routing decision made.


7. Decision Logic & Escalation Routing

After filtering, each signal carries a final composite score in [0, 100]. Three routes exist:

Decision Logic Architecture

In the Analyst Review Interface, the analyst sees the signal score, contributing factors, sample anonymized posts, county map view, and the audit trail for that signal. They can approve escalation, flag for monitoring, or reject with a categorized reason.


8. Governance Reflection

The system could surface false positives to the State Behavioral Health Office, triggering resource allocation toward non-existent crises - decreasing institutional trust in the tool. The subtler risk is the inverse: a system tuned to avoid false positives may under-report genuine signals, creating false confidence.

The Most Important Safeguard: The dual-model consensus gate directly addresses the core failure mode for noisy or misleading input. Requiring agreement between a deep contextual model and a fast lexical model before a signal advances catches both model hallucinations and input quality issues. Everything else in the system (bias correction, bot detection, OOD filtering) reduces noise at the edges. The consensus gate protects the core.


Project 3: Intelligent Ingestion & Semantic Discovery for Grant Opportunities

  • Dynamic scrapers: Headless browser or REST API bypass for SPAs
  • Layout-aware PDF extraction: PyMuPDF or LayoutLM for structural analysis - headers, tables, footnotes - rather than raw text buffers
  • Queued parallel crawling: Celery or asyncio-based task queue capable of processing hundreds of URLs concurrently without rate-limit violations

B. Normalization and Validation Layer

  • Schema mapping: Source-specific field names mapped to a canonical JSON schema
  • Typed validation: Pydantic models enforce date formats, currency ranges, and required IDs
  • Cross-source deduplication: Detects when the same FOA appears on multiple platforms and merges rather than duplicates

C. Semantic Enrichment Layer

  • Rule engine: Fast regex tagging for explicit mentions (e.g., K-12 populations, rural areas)
  • Vector similarity: Local all-MiniLM-L6-v2 inference mapping FOA content to NIH/CASRAI ontology nodes
  • LLM contextualization: Gemini-based secondary pass to surface latent interdisciplinary properties
  • LLM summarization: Concise, researcher-focused abstract generation

D. Persistence and Interface Layer

  • Vector database: FAISS or ChromaDB for rapid semantic retrieval
  • Relational store: PostgreSQL for structured metadata, deadlines, and audit logs
  • API: FastAPI endpoints for search, filter, and export
  • Web dashboard: React/Vite frontend for non-technical research staff

Project 4: Scalable Framework for Quantifying AI Trust (TrustEval)

The system is designed as a controlled experimental infrastructure measuring how linguistic variations in AI responses influence user trust and decision-making. The architecture follows a Backend-for-Frontend (BFF) design to ensure all sensitive logic remains server-side.

Core Components:

  1. Experiment Engine - Assigns users to conditions (A, B, or multiple variants), maintains state across sessions, ensures reproducibility
  2. Persona Templating System - Defines linguistic styles (technical, persuasive, neutral), injects structured prompts into LLM requests
  3. Telemetry and Instrumentation Module - Captures high-resolution behavioral data using browser APIs for sub-millisecond precision
  4. Security and Integrity Layer - Ensures experiment variables are not exposed to participants, protects API keys, validates incoming data
  5. Researcher Dashboard - Provides aggregated metrics (trust scores, acceptance ratios, behavioral patterns) with filtering by persona, condition, or cohort

Data Flow:

  1. User initiates experiment and frontend requests condition assignment
  2. Backend assigns persona and injects prompt before calling the LLM
  3. Response is returned to frontend and displayed to the user
  4. User interaction is captured and telemetry is logged
  5. Data is sent to backend and stored in a structured database
  6. Researchers access aggregated insights through the dashboard

Project 1: Team Communication Processing and Audio Enhancement

This system transforms raw team communication data into high-quality, analysis-ready audio through structured preprocessing and enhancement pipelines.

Pipeline stages:

  1. Data Acquisition Layer - Sources multi-speaker audio/video datasets, handles ingestion of raw files and metadata
  2. Preprocessing Layer - Converts audio into standardized formats, segments long recordings, applies noise profiling and baseline filtering
  3. Enhancement Layer - Implements noise reduction, filtering, and signal amplification while preserving speech characteristics
  4. Evaluation Layer - Compares original and enhanced audio using objective and qualitative metrics (PESQ, STOI)
  5. Output and Reproducibility Layer - Stores processed samples and analysis results; provides Jupyter notebooks for reproducible workflows

7. Conclusion

Across these projects, the underlying approach consistently focuses on building structured, reliable, and scalable systems that transform complex, unstructured inputs into meaningful and actionable outputs. Whether it is enhancing raw communication signals, detecting sensitive mental health trends, enabling semantic discovery in fragmented information ecosystems, or rigorously evaluating human trust in AI systems - each project addresses a different layer of the broader challenge of making intelligent systems more interpretable, trustworthy, and usable in real-world contexts.

A strong emphasis has been placed on modular architecture, reproducibility, and responsible design - ensuring that each solution is not only technically sound but also aligned with practical deployment constraints and ethical considerations.

AI Usage Disclosure: AI tools were used selectively to assist with drafting, structuring, and refining the language of this proposal. All system design decisions, architecture, and technical content were independently conceptualized and validated. The final submission reflects original thinking, with AI serving only as a supporting tool for clarity and presentation.


GitHub Repositories

Project Repository
FOA Ingestion & Semantic Discovery GSoC-2026-ISSR_FOA_Project
AI Trust Experiment (TrustEval) GSoC-2026-trust-experiment
Team Communication Processing GSoC-2026-Team-Communication-submission
Heramb

19 y/o engineering student by day, and indie hacker by night.

Design & Developed by heramb

© 2025. All rights reserved.