Status: Systems_Optimal

I build systems
that think.

Exploring the frontier of AI, Infrastructure, and Systems Engineering. Architecting the bridges between neural logic and computational scale.

Voice latency

14ms

Inference nodes

08

Model contexts

12.4M

memoryNode Alpha
99.98%
EFFICIENCY
streamFlux Rate
1.4 TB/s
STABLE

// Manifest_v5.0

The_ Architecture

Core_Engineering

High-Performance ML

Optimizing inference pipelines for sub-millisecond real-time applications. Bridging the gap between theory and production.

Systems_Design

Vector + RAG Infrastructure

Migrated ChromaDB → Qdrant with hybrid dense-sparse search. Added semantic caching and connection pooling across the orchestration layer.

Research_Frontier

Domain Fine-tuning

LoRA / PEFT adaptations on Llama 3.x for narrow, consumer-grade AI verticals. 10k+ term datasets, shipped weights.

// Interactive_Scroll · 06 Chapters

The_ Pipeline

Scroll through six beats of the voice-agent pipeline. Each chapter pins a node. The graph lights up as you move.

Chapter 01 / 06
01
// Capture_node

Voice_Capture

Raw PCM audio streams in over LiveKit. No buffering, no round-trip. Every 20ms frame is eligible for inference.

Frame size · 20ms
Chapter 02 / 06
02
// Transcribe_node

Stream_Transcription

Rolling ASR window; partial transcripts flush to the orchestrator before the speaker finishes the sentence.

First token · 180ms
Chapter 03 / 06
03
// Embed_node

Dense_Sparse_Embed

Each partial is embedded twice — dense for semantics, sparse for lexical recall. Hybrid rank, not reranked.

Dim · 768 + BM25
Chapter 04 / 06
04
// Retrieve_node

Qdrant_Retrieval

Migrated off ChromaDB. Qdrant hybrid search + semantic cache cuts retrieval latency by 90%. Pooled connections, pre-warmed shards.

Latency · −90%
Chapter 05 / 06
05
// Generate_node

LLM_Orchestration

Function-calling LLM fuses context, tools, and user intent. Cached prompts and speculative decoding shave another 60% off round-trip.

API calls · −60%
Chapter 06 / 06
06
// Respond_node

Voice_Synthesis

SNAC-codec TTS streams audio back within the same LiveKit session. End-to-end, input-to-speech: under 14ms ceiling.

E2E · 14ms

Manifest_v5.0

THE FORGE

Incubating neural architectures and decentralized systems. A gallery of synthetic intelligence.

Featured
Llama 3.2 3B2025

Orpheus TTS

A language model where the 'language' is sound

LLAMA 3.2 3BLORAUNSLOTHSNAC CODEC+4
19min1 epoch, 299 steps, single GPU
Smart Pathshala
FastAPIJuly

Smart Pathshala

Offline-first AI school ecosystem for Tier-2 and Tier-3 India

FASTAPIREACTPOSTGRESQLQDRANT+1
4Admin, Teacher, Parent, Student
lockPrivate
Live Demo
LLMApri

ShetNiyojan

Intelligent agricultural planning from seed to sale

LLMFLASKMONGODB
1stWinner at Devclash 2025, DY Patil
WarCast screenshot 1
WarCast screenshot 2
WarCast screenshot 3
WarCast screenshot 4
Flask2025

WarCast

AI-based defense news aggregator with sentiment analysis and summaries

FLASKPYTHONDISTILBERTBART+2
10+Global defense publishers aggregated in real time
Legify
BERTSumMarc

Legify

AI-powered legal document simplification and Q&A

BERTSUMFAISSDJANGOTTS/STT
1stWinner at Synapse 2.0, CCOEW
5 projects
System_Status: Operational

THE MATRIX

A live map of the technical stack — from model training to production inference, data pipelines, and frontend delivery.

Neural_Nodes

Technical Proficiency Graph
PythonPyTorchLoRALlama 3llama.cppFastAPILiveKitQdrantFAISSMongoDBReactNext.jsDockerCORESTACK
psychology
AI / ML
8 modules
PythonPyTorchLlama 3.xLoRA / PEFTUnslothSNAC CodecBERTSumCrewAI
bolt
Inference & Serving
6 modules
llama.cppGGUF / Q4_K_MFastAPIFlaskLiveKitLLM Streaming
database
Data & Vector
6 modules
QdrantFAISSChromaDBPostgreSQLMongoDBRAG Pipelines
code_blocks
Frontend & Infra
6 modules
ReactNext.jsSvelteDjangoDockerLightning AI / GPU
Chronological_Expansion

The_Pipeline

sensors

VoiceraCX

AI Intern

June 2025 — Present

Voice agent platform with 3s response latency, ChromaDB bottleneck, redundant LLM API calls across pipeline.

−60%Latency
−90%Query time
−60%API calls

// PROCESS

01

Built real-time voice pipelines using LiveKit

02

Migrated vector storage from ChromaDB → Qdrant with hybrid dense-sparse search

03

Implemented LLM streaming inference, connection pooling, and smart caching

hub

DAOStreet

Software Development Intern

Feb 2025 — June 2025

Web application built on Svelte with open tickets across UI/UX and feature development.

ConsistentTickets
ImprovedUI quality

// PROCESS

01

Solved development tickets across the Svelte codebase

02

Debugged and enhanced UI/UX components for responsiveness

psychology

Alesa AI Ltd, UK

AI/ML Intern

Nov 2024 — Mar 2025

Astrology platform needing domain-specific AI for dream interpretation with fine-tuned language models.

10k+Dataset
Llama 3.1-8BModel

// PROCESS

01

Worked on Tangent Mind — tarot, horoscopes, dream interpretation platform

02

Fine-tuned Llama-3.1-8B using PEFT on 10,000+ dream-related terms dataset

Terminal / Transmission

Establish Connection

Send a message through the neural gateway or route through the authenticated social nodes. The form validates client-side and opens a prefilled message in your default mail client.

system_session_transmit
encryptedMail Client Relay / AES-256 Styled

Pune Link Node

18.52°N / 73.86°E

Neural_Status

LOC: Pune, Maharashtra
TZ: UTC+05:30
AVAILABILITY: High_Priority