AI Native Developer News

Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR

The introduction of arrol, an online rollout pruning method, enhances the efficiency and accuracy of Reinforcement Learning with Verifiable Rewards (RLVR) in Large Language Models. By allowing early pruning of rollouts during generation, it significantly speeds up training and improves accuracy, thus providing developers with a more efficient approach to optimizing LLMs.

arXiv CS.CL·5d ago

ai-coding-toolsai-modelsai-research

A-SelecT: Automatic Timestep Selection for Diffusion Transformer Representation Learning

The paper presents A-SelecT, an innovative technique for automatic timestep selection in Diffusion Transformer (DiT) representation learning, aimed at enhancing training efficiency and representational capacity. A-SelecT identifies the most informative timesteps during a single run, effectively removing the need for extensive exhaustive searches. Experimental results indicate that DiT, augmented by A-SelecT, outperforms previous diffusion models in both classification and segmentation tasks. This advancement highlights its potential for improving discriminative tasks through enhanced generative pre-training.

arXiv CS.AI·2d ago

ai-research

Liberate your OpenClaw

The article discusses enhancements to the OpenClaw framework, which offers AI developers new capabilities and optimizations for building applications. These improvements are crucial for advancing productivity and efficiency in AI workflows.

Hugging Face Blog·5d ago

ai-coding-toolsopen-source

v1.3.34-vscode

The v1.3.34 release of the VSCode-based tool introduces essential updates aimed at improving security, user experience, and compatibility with new AI features. Notably, the addition of Tensorix as an LLM provider enhances the tool's capabilities for AI developers.

Continue.dev Changelog·6d ago

ai-coding-toolsopen-source

All the latest in AI ‘music’

The article delves into the growing influence of AI in the music industry, highlighting key developments such as Suno's latest funding round of $2.45 billion amid looming lawsuits and partnerships with major music labels like Universal Music and Warner Music Group. Apple Music and Qobuz are introducing features to label and detect AI-generated music, while Bandcamp has become the first major platform to ban AI content entirely. Despite these advancements, there are significant concerns from musicians regarding the impact of AI on their livelihoods and the authenticity of music creation.

The Verge - AI·2d ago

ai-news

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

The article presents a new benchmark methodology for evaluating repository-aware software engineering systems, addressing issues of synthetic task design and prompt leakage. It uses a time-consistent approach, evaluating code knowledge generated before a specific time (T0) on engineering tasks derived from future pull requests. The study reported baseline results using three Claude-family models across two open-source repositories, DragonFly and React, achieving file-level F1 scores of 0.8081 and 0.8078 respectively, demonstrating the importance of prompt construction as a benchmark variable.

arXiv CS.SE·2d ago

ai-research

A Benchmark for Evaluating Repository-Level Code Agents with Intermediate Reasoning on Feature Addition Task

The article introduces RACE-bench, a new benchmark for evaluating repository-level code agents in feature addition tasks, consisting of 528 instances sourced from 12 open-source repositories. This framework evaluates agents not just on patch correctness but also on the quality of their intermediate reasoning, revealing that success rates for different agents range from 29% to 70%. Analysis indicates that while agents are good at understanding high-level intent, they struggle significantly in translating that intent into actionable implementation steps, with a 35.7% decrease in reasoning recall and a 94.1% increase in over-prediction in cases where apply was successful but tests failed. These findings emphasize the need for comprehensive evaluation of code agents beyond mere correctness of final outputs.

arXiv CS.SE·2d ago

ai-research

SWE-PRBench: Benchmarking AI Code Review Quality Against Pull Request Feedback

The article introduces SWE-PRBench, a benchmark consisting of 350 pull requests used to evaluate AI code review quality. The assessments reveal that eight advanced models only detect 15-31% of issues flagged by human reviewers, indicating that AI code review is significantly less effective than human performance. The study examined three configurations for context provision, finding that models consistently underperformed when context increased. Notably, it was found that the best-performing models achieved mean scores between 0.147 to 0.153, while a clear gap was observed with the remaining models, which scored 0.113 or lower. The dataset and framework used for evaluation are publicly accessible.

arXiv CS.SE·2d ago

ai-researchai-models

Salesforce announces an AI-heavy makeover for Slack, with 30 new features

Salesforce has unveiled a comprehensive AI-driven update for Slack, introducing 30 new features aimed at enhancing user experience and productivity. Key advancements include improved search functionalities, smarter context-aware suggestions, and integrations with Salesforce's CRM capabilities. The new features are designed to facilitate better communication and collaboration in workspace environments, thereby making Slack considerably more useful for daily operations.

TechCrunch - AI·7h ago

ai-news

Improve coding agents’ performance with Gemini API Docs MCP and Agent Skills.

Google has launched two tools to improve the performance of coding agents using outdated Gemini API code. The Gemini API Docs MCP aims to enhance the accuracy of code generation by providing updated documentation access, while the Agent Skills tool focuses on training agents to improve their code output. These developments are intended to address issues stemming from the cutoff date of the training data for these agents, ensuring that they can produce relevant and current code for developers.

Google Developers Blog·just now

ai-coding-toolsai-frameworksfrontier-labs