AI Native Developer News

Salesforce announces an AI-heavy makeover for Slack, with 30 new features

Salesforce has unveiled a comprehensive AI-driven update for Slack, introducing 30 new features aimed at enhancing user experience and productivity. Key advancements include improved search functionalities, smarter context-aware suggestions, and integrations with Salesforce's CRM capabilities. The new features are designed to facilitate better communication and collaboration in workspace environments, thereby making Slack considerably more useful for daily operations.

TechCrunch - AI·7h ago

ai-news

datasette-enrichments-llm 0.2a0

The article discusses the release of version 0.2a0 of datasette-enrichments-llm, a tool that enhances Datasette with capabilities powered by large language models (LLMs). This update introduces several new features focused on improving data enrichment and usability within the Datasette framework. Key improvements include enhanced support for structured data and more intuitive interfaces. The update positions datasette-enrichments-llm as a valuable resource for developers seeking to leverage LLMs in their data applications.

Simon Willison·2h ago

ai-coding-toolsai-frameworksopen-source

An Empirical Recipe for Universal Phone Recognition

The research paper presents PhoneticXEUS, a phone recognition model trained on extensive multilingual data, achieving state-of-the-art performance with 17.7% PFER on multilingual tasks and 10.6% PFER on accented English speech. The study identifies key factors that affect performance in multilingual phone recognition, including data scale, architecture, and training objectives. By conducting controlled ablations across 100+ languages, the paper quantifies the effects of SSL representations and analyzes error patterns related to language families and articulatory features. The authors have made all data and code openly accessible for further research.

arXiv CS.CL·2h ago

ai-research

Towards Explainable Stakeholder-Aware Requirements Prioritisation in Aged-Care Digital Health

This paper explores the human aspects shaping requirement prioritization in aged-care digital health through a mixed-methods study involving 103 older adults, 105 developers, and 41 caregivers. By employing explainable machine learning, the study identified key human factors related to requirement priorities across eight themes, revealing significant misalignment among stakeholder groups. The research contributes an explainable, human-centric requirements engineering framework, enhancing the inclusiveness of requirements analysis by explicitly engaging various stakeholder perspectives.

arXiv CS.SE·2h ago

ai-research

Knowledge database development by large language models for countermeasures against viruses and marine toxins

This research paper presents the development of comprehensive databases for therapeutic countermeasures against five viruses (Lassa, Marburg, Ebola, Nipah, and Venezuelan equine encephalitis) and marine toxins by utilizing two large language models (LLMs), ChatGPT and Grok. The LLMs were used to identify relevant public databases, collect pertinent information, and cross-validate this data to create user-friendly interactive webpages. The study emphasizes the effectiveness of LLMs in building scalable and updatable knowledge databases that facilitate evidence-based decision-making in medical research.

arXiv CS.AI·2h ago

ai-research

Running local models on Macs gets faster with Ollama's MLX support

Ollama has introduced support for Apple's open-source MLX framework for machine learning, enhancing its runtime system for operating large language models on local computers. This update improves caching performance and adds support for Nvidia's NVFP4 format for model compression, which optimizes memory usage. These enhancements are expected to significantly boost performance on Macs with Apple Silicon chips (M1 or later). The surge in interest for local models, exemplified by OpenClaw's rapid rise to over 300,000 stars on GitHub, underscores the growing trend in using local computing resources for machine learning.

Ars Technica - AI·7h ago

ai-frameworksai-models

llm-all-models-async 0.1

The article discusses the release of 'llm-all-models-async 0.1', a significant new version that promises to enhance the performance and ease of use of various language models in asynchronous environments. This version introduces an improved API that facilitates smoother operations when working with multiple models. Specific performance metrics are not detailed, but the focus is on enabling developers to make better use of language models in their applications. The update is aimed at streamlining workflows in AI-assisted software development.

Simon Willison·9h ago

ai-modelsai-frameworks

GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification

The article introduces GISTBench, a benchmark designed for evaluating Large Language Models' (LLMs) capabilities in understanding user interests within recommendation systems through their interaction histories. It features two new metric families: Interest Groundedness (IG), which includes precision and recall components to penalize hallucination while rewarding coverage, and Interest Specificity (IS), which assesses distinctiveness of LLM-generated user profiles. A synthetic dataset based on real user interactions is released, containing implicit and explicit engagement signals, with validation against user surveys. The evaluation of eight open-weight LLMs, ranging from 7B to 120B parameters, uncovers significant performance bottlenecks, particularly in counting and attributing engagement signals.

arXiv CS.AI·2h ago

ai-researchai-models

SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

The article presents SciVisAgentBench, a new benchmark for evaluating scientific data analysis and visualization agents, developed in response to the need for a principled evaluation framework in this rapidly evolving field. This benchmark includes 108 expert-crafted cases covering multiple scenarios and is structured around four dimensions: application domain, data type, complexity level, and visualization operation. It introduces a multimodal evaluation pipeline that combines human judgment with various deterministic evaluation methods. A validity study involving 12 SciVis experts was conducted to explore the agreement between human and LLM judges, establishing initial baselines and identifying capability gaps in current SciVis agents.

arXiv CS.AI·2h ago

ai-researchai-models

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

ChartDiff presents a large-scale benchmark for cross-chart comparative summarization, comprising 8,541 chart pairs sourced from diverse datasets and visual styles. The benchmark includes LLM-generated and human-verified summaries that assess differences in trends, fluctuations, and anomalies. Evaluation results indicate that general-purpose models achieve the highest GPT-based quality, while specialized models perform better in ROUGE scores but struggle with human-aligned evaluation. The study finds that multi-series charts pose challenges regardless of model type, highlighting the difficulties in comparative chart reasoning for current vision-language models.

arXiv CS.AI·2h ago

ai-research