Hugging Face Adds Private Datasets to the Open ASR Leaderboard to Fight Benchmark Gaming
Hugging Face introduces private ASR evaluation datasets from Appen Inc. and DataoceanAI to block benchmaxxing, with scores visible via an opt-in toggle.
Academic papers, novel architectures, training techniques, and fundamental AI research breakthroughs.
9 articles · ← All articles
Hugging Face introduces private ASR evaluation datasets from Appen Inc. and DataoceanAI to block benchmaxxing, with scores visible via an opt-in toggle.
OpenAI and five hardware partners release MRC through the Open Compute Project to reduce congestion and hardware-fault disruptions in large GPU clusters.
GitHub user erogol's BlaGPT offers an open-source research sandbox for evaluating LM architectures and components on compact datasets.
A new architecture called SubQ targets 12 million token context windows while sidestepping the quadratic compute scaling that limits standard transformers.
A ternary-weight 1.7B model achieves 442 T/s on Apple M4 Max, demonstrating how ultra-compact weight encoding translates to real-world on-device inference speed.
A new arXiv preprint examines whether known large language model biases can be deliberately exploited to distort AI-generated search summaries.
A peer-reviewed Harvard and Beth Israel study finds OpenAI's o1 model achieved accurate triage diagnoses in 67% of cases versus 50–55% for attending physicians.
Google DeepMind's AI co-clinician achieved a critical-error rate of zero in 97 of 98 simulated clinical queries, outperforming tools already in routine physician use.
David Silver, who built AlphaGo at DeepMind, argues large language models are fundamentally capped by human data and has founded Ineffable Intelligence to pursue reinforcement learning instead.