KOINEU

Matching Ranks Over Probability Yields Truly Deep Safety Alignment

: 본 논문은 대형 언어 모델(LLM)의 안전성 강화를 위한 새로운 접근법을 제안하고 있습니다. 특히, 사전 채우기 공격과 이를 우회하는 방식에 대한 심도 있는 분석을 제공합니다. 1. 사전 채우기 공격 및 RAP 공격 사전 채우기 공격은 사용자가 LLM에 유해한 요청을 할 때, 확인을 위한 긍정적인 텍스트를 미리 입력하여 디코딩 과정을 시작하는 방법입니다. 이는 LLM이 안전 정렬되어 있어도 유해한 내용을 추출할 수 있게 합니다. RAP (Rank Assisted Prefilling) 공격은 사전 채우기와 각 디코딩 단계에서 상위

Matching Ranks Over Probability Yields Truly Deep Safety Alignment

Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning

Mirror Mode in Fire Emblem: Beating Players at their own Game with Imitation and Reinforcement Learning

Modeling and Optimizing Performance Bottlenecks for Neuromorphic Accelerators

More Consistent Accuracy PINN via Alternating Easy-Hard Training

Multi-granularity Interactive Attention Framework for Residual Hierarchical Pronunciation Assessment

Multi-view diffusion geometry using intertwined diffusion trajectories

Network of Theseus (like the ship)

Not All Transparency Is Equal: Source Presentation Effects on Attention, Interaction, and Persuasion in Conversational Search

ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems

The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries

The Effect of Document Summarization on LLM-Based Relevance Judgments

The Initialization Determines Whether In-Context Learning Is Gradient Descent

The Loss Landscape of Powder X-Ray Diffraction-Based Structure Optimization Is Too Rough for Gradient Descent

The Machine Learning Canvas: Empirical Findings on Why Strategy Matters More Than AI Code Generation

Towards 6G Native-AI Edge Networks: A Semantic-Aware and Agentic Intelligence Paradigm

Utilizing Earth Foundation Models to Enhance the Simulation Performance of Hydrological Models with AlphaEarth Embeddings

Waveform-Based Probabilistic Seismic Hazard Analysis Using Ground-Motion Generative Models

확률적 트리 탐색으로 강화된 확산 언어 모델 추론

A Comprehensive Framework for Automated Quality Control in the Automotive Industry

Do Large Language Models Walk Their Talk? Measuring the Gap Between Implicit Associations, Self-Report, and Behavioral Altruism

Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing

Differentially Private Rankings via Outranking Methods and Performance Data Aggregation

HEAR 기반 음악 미학 평가 프레임워크

Introducing Visual Scenes and Reasoning: A More Realistic Benchmark for Spoken Language Understanding

Learning Solution Operators for Partial Differential Equations via Monte Carlo-Type Approximation

LLM 기반 Git bisect로 시맨틱 결함 탐지 혁신

MultiBanAbs: A Comprehensive Multi-Domain Bangla Abstractive Text Summarization Dataset

Using Span Queries to Optimize for Cache and Attention Locality

기관 무관 종양 분할을 위한 개인화 연합 학습

딥러닝 기반 P파 첫운동극성 자동 판별로 보는 2022 루딩 지진열의 초소형 지진 메커니즘

프록시 연산자를 활용한 효율적인 텍스트‑투‑이미지 확산 모델 ProxT2I

A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks

Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges

Angular Steering: Behavior Control via Rotation in Activation Space

Arxiv 2512.23731

Assessing the Human-Likeness of LLM-Driven Digital Twins in Simulating Health Care System Trust

Balancing Interpretability and Performance in Motor Imagery EEG Classification: A Comparative Study of ANFIS-FBCSP-PSO and EEGNet

Bayesian Network Fusion of Large Language Models for Sentiment Analysis

Benchmarking LLM Agents for Wealth-Management Workflows

Bridging Synthetic and Real Routing Problems via LLM-Guided Instance Generation and Progressive Adaptation

Circuits, Features, and Heuristics in Molecular Transformers

CodeFuse-CommitEval: Towards Benchmarking LLM's Power on Commit Message and Code Change Inconsistency Detection

Computational Foundations for Strategic Coopetition: Formalizing Trust and Reputation Dynamics

Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models

Continual Error Correction on Low-Resource Devices

DAMBench: A Multi-Modal Benchmark for Deep Learning-based Atmospheric Data Assimilation

DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement

Enhancing Decision-Making in Windows PE Malware Classification During Dataset Shifts with Uncertainty Estimation

EvoMem: Improving Multi-Agent Planning with Dual-Evolving Memory

< Category Statistics (Total: 5502) >

Start searching

No results found