General Research

All posts under category "General Research"

699 posts total
Sorted by date
์‹œ๊ฐ ์ฆ๊ฐ• ์‚ฌ์œ  ์‚ฌ์Šฌ: ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ ๋™์  ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์œผ๋กœ VLM ๊ฒฌ๊ณ ์„ฑ ๊ฐ•ํ™”

์‹œ๊ฐ ์ฆ๊ฐ• ์‚ฌ์œ  ์‚ฌ์Šฌ: ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ ๋™์  ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์œผ๋กœ VLM ๊ฒฌ๊ณ ์„ฑ ๊ฐ•ํ™”

While visual data augmentation remains a cornerstone for training robust vision models, it has received limited attention in visual language models (VLMs), which predominantly rely on large-scale real data acquisition or synthetic diversity. Consequently, they may struggle with basic perception task

์••์ถ•๊ธฐโ€‘์˜ˆ์ธก๊ธฐ ์‹œ์Šคํ…œ์˜ ์ •๋ณด์ด๋ก ์  ์„ค๊ณ„์™€ ์„ฑ๋Šฅ ์˜ˆ์ธก

์••์ถ•๊ธฐโ€‘์˜ˆ์ธก๊ธฐ ์‹œ์Šคํ…œ์˜ ์ •๋ณด์ด๋ก ์  ์„ค๊ณ„์™€ ์„ฑ๋Šฅ ์˜ˆ์ธก

Agentic language model (LM) systems power modern applications like 'Deep Research' and 'Claude Code,' and leverage multi-LM architectures to overcome context limitations. Beneath their apparent diversity lies a recurring pattern: smaller 'compressor' LMs (that can even run locally) distill raw conte

์˜จ๋ผ์ธ ๋‹ค๊ธฐ๊ด€ ํ˜‘์—…์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ์†Œํ”„ํŠธ์›จ์–ด ๊ณตํ•™ ์—ฐ๊ตฌ ๊ฐ•์ขŒ

์˜จ๋ผ์ธ ๋‹ค๊ธฐ๊ด€ ํ˜‘์—…์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ์†Œํ”„ํŠธ์›จ์–ด ๊ณตํ•™ ์—ฐ๊ตฌ ๊ฐ•์ขŒ

Covid has made online teaching and learning acceptable and students, faculty, and industry professionals are all comfortable with this mode. This comfort can be leveraged to offer an online multi-institutional research-level course in an area where individual institutions may not have the requisite

์˜๋ฃŒ ๋น„์ „ ์–ธ์–ด ๋ชจ๋ธ ์‹œ๊ฐ ์ •๋ ฌ์„ ์œ„ํ•œ ๊ฒฝ๋Ÿ‰ ๋””์Šคํ‹ธ๋ ˆ์ด์…˜

์˜๋ฃŒ ๋น„์ „ ์–ธ์–ด ๋ชจ๋ธ ์‹œ๊ฐ ์ •๋ ฌ์„ ์œ„ํ•œ ๊ฒฝ๋Ÿ‰ ๋””์Šคํ‹ธ๋ ˆ์ด์…˜

Medical Large Vision-Language Models (Med-LVLMs) have shown promising results in clinical applications, but often suffer from hallucinated outputs due to misaligned visual understanding. In this work, we identify two fundamental limitations contributing to this issue: insufficient visual representat

์ดˆ๊ณ ํ•ด์ƒ๋„ UAV ๊ธฐ๋ฐ˜ ์ •๋ฐ€ ํ™”์žฌ ํ™•์‚ฐ ์˜ˆ์ธก ๋ฐ์ดํ„ฐ์…‹ FireSentry์™€ FiReDiff ๋ชจ๋ธ

์ดˆ๊ณ ํ•ด์ƒ๋„ UAV ๊ธฐ๋ฐ˜ ์ •๋ฐ€ ํ™”์žฌ ํ™•์‚ฐ ์˜ˆ์ธก ๋ฐ์ดํ„ฐ์…‹ FireSentry์™€ FiReDiff ๋ชจ๋ธ

Fine-grained wildfire spread prediction is crucial for enhancing emergency response efficacy and decision-making precision. However, existing research predominantly focuses on coarse spatiotemporal scales and relies on low-resolution satellite data, capturing only macroscopic fire states while funda

์ปจ๋ณผ๋ฃจ์…˜ ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๋ณต์›์—์„œ ํ™˜๊ฐ ํ˜„์ƒ ์ •๋Ÿ‰ํ™”์™€ ์‹ ๋ขฐ์„ฑ ํ™•๋ณด๋ฅผ ์œ„ํ•œ ์ปจํฌ๋ฉ€ ํ™˜๊ฐ ์ถ”์ • ์ง€ํ‘œ

์ปจ๋ณผ๋ฃจ์…˜ ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๋ณต์›์—์„œ ํ™˜๊ฐ ํ˜„์ƒ ์ •๋Ÿ‰ํ™”์™€ ์‹ ๋ขฐ์„ฑ ํ™•๋ณด๋ฅผ ์œ„ํ•œ ์ปจํฌ๋ฉ€ ํ™˜๊ฐ ์ถ”์ • ์ง€ํ‘œ

U-Net and other U-shaped architectures have achieved significant success in image deconvolution tasks. However, challenges have emerged, as these methods might generate unrealistic artifacts or hallucinations, which can interfere with analysis in safety-critical scenarios. This paper introduces a no

ํด๋ผ์šฐ๋“œ EDA ์ž‘์—… ์˜ˆ์ธก์„ ์œ„ํ•œ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ

ํด๋ผ์šฐ๋“œ EDA ์ž‘์—… ์˜ˆ์ธก์„ ์œ„ํ•œ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ

The rapid growth of cloud computing in the Electronic Design Automation (EDA) industry has created a critical need for resource and job lifetime prediction to achieve optimal scheduling. Traditional machine learning methods often struggle with the complexity and heterogeneity of EDA workloads, requi

ํ”„๋ฆฌํŠธ๋ ˆ์ธ๋œ ๋จธ์‹ ๋Ÿฌ๋‹ ์ธํ„ฐ์•กํ‹ฐ๋ธŒ ํฌํ…์…œ์˜ ๋ฏธ์„ธ์กฐ์ •์œผ๋กœ ๊ตฌ์กฐ ์ตœ์ ํ™” ์ •ํ™•๋„ 30% ํ–ฅ์ƒ

ํ”„๋ฆฌํŠธ๋ ˆ์ธ๋œ ๋จธ์‹ ๋Ÿฌ๋‹ ์ธํ„ฐ์•กํ‹ฐ๋ธŒ ํฌํ…์…œ์˜ ๋ฏธ์„ธ์กฐ์ •์œผ๋กœ ๊ตฌ์กฐ ์ตœ์ ํ™” ์ •ํ™•๋„ 30% ํ–ฅ์ƒ

Accurate structural relaxation is critical for advanced materials design. Traditional approaches built on physics-derived first-principles calculations are computationally expensive, motivating the creation of machine-learning interatomic potentials (MLIPs), which strive to faithfully reproduce firs

AI ์„ค๋“ ๊ธฐ์ˆ ์ด ๋ฏผ์ฃผ์ฃผ์˜ ์—˜๋ฆฌํŠธ์˜ ์ •์ฑ… ํŽธํ–ฅ๊ณผ ์–‘๊ทนํ™” ์ „๋žต์„ ์žฌ๊ตฌ์„ฑํ•œ๋‹ค

AI ์„ค๋“ ๊ธฐ์ˆ ์ด ๋ฏผ์ฃผ์ฃผ์˜ ์—˜๋ฆฌํŠธ์˜ ์ •์ฑ… ํŽธํ–ฅ๊ณผ ์–‘๊ทนํ™” ์ „๋žต์„ ์žฌ๊ตฌ์„ฑํ•œ๋‹ค

In democracies, major policy decisions typically require some form of majority or consensus, so elites must secure mass support to govern. Historically, elites could shape support only through limited instruments like schooling and mass media; advances in AI-driven persuasion sharply reduce the cost

๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด์™€ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ ๋ฐ ์ด๋ฒคํŠธ ํŠธ๋ฆฌ๊ฑฐ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฒฐํ•ฉํ•œ ํ†ตํ•ฉ ์ œ์–ด ๊ตฌ์กฐ

๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด์™€ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ ๋ฐ ์ด๋ฒคํŠธ ํŠธ๋ฆฌ๊ฑฐ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฒฐํ•ฉํ•œ ํ†ตํ•ฉ ์ œ์–ด ๊ตฌ์กฐ

This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system

๊ตํ™˜ํ˜• ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฐ€์šฐ์‹œ์•ˆ ๊ธฐ๋ฐ˜ ๊ณ ์ •๋ฐ€ยท๊ณ ํ’ˆ์งˆ 3D ์žฌ๊ตฌ์„ฑ

๊ตํ™˜ํ˜• ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฐ€์šฐ์‹œ์•ˆ ๊ธฐ๋ฐ˜ ๊ณ ์ •๋ฐ€ยท๊ณ ํ’ˆ์งˆ 3D ์žฌ๊ตฌ์„ฑ

Figure 1 : Comparison of 3DGS, 2DGS, and our EGGS. While 3DGS achieves high-fidelity appearance, it often produces inaccurate geometry, with imprecise surfaces and blurred edges. 2DGS improves geometric consistency across views but suffers from reduced appearance quality due to over-smoothed surfac

๋‡Œ์ „์œ„์™€ ์ŠคํŒŒ์ดํฌ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๊ต์ฐจ ๋ชจ๋‹ฌ ์ง€์‹ ์ฆ๋ฅ˜: ๋‹ค์„ธ์…˜ LFP ๋ณ€ํ™˜๊ธฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ

๋‡Œ์ „์œ„์™€ ์ŠคํŒŒ์ดํฌ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๊ต์ฐจ ๋ชจ๋‹ฌ ์ง€์‹ ์ฆ๋ฅ˜: ๋‹ค์„ธ์…˜ LFP ๋ณ€ํ™˜๊ธฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ

Local field potentials (LFPs) can be routinely recorded alongside spiking activity in intracortical neural experiments, measure a larger complementary spatiotemporal scale of brain activity for scientific inquiry, and can offer practical advantages over spikes, including greater long-term stability,

๋‹จ๋ฐฑ์งˆ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋น ๋ฅธ ์ง€๋„ํ•™์Šต์œผ๋กœ ํšจ์œจ์ ์ธ ๋‹จ๋ฐฑ์งˆ ์„ค๊ณ„์™€ ํ˜์‹ ์  ์„œ์—ด ํƒ์ƒ‰

๋‹จ๋ฐฑ์งˆ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋น ๋ฅธ ์ง€๋„ํ•™์Šต์œผ๋กœ ํšจ์œจ์ ์ธ ๋‹จ๋ฐฑ์งˆ ์„ค๊ณ„์™€ ํ˜์‹ ์  ์„œ์—ด ํƒ์ƒ‰

Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains, yet its application to protein sequence modeling and protein language models (PLMs) remains ad hoc. This is in part because highquality annotated data are far more difficult to obtain for p

๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํžˆ์Šคํ…Œ๋ฆฌ์‹œ์Šค ๋ชจ๋ธ ์ž๋™ ์ถ”์ถœ์„ ์œ„ํ•œ ํ†ตํ•ฉ ๋‚ด๋ถ€ ๋ณ€์ˆ˜ ํ•™์Šต ๋ฐ ์‹ฌ๋ณผ๋ฆญ ํšŒ๊ท€ ํ”„๋ ˆ์ž„์›Œํฌ

๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํžˆ์Šคํ…Œ๋ฆฌ์‹œ์Šค ๋ชจ๋ธ ์ž๋™ ์ถ”์ถœ์„ ์œ„ํ•œ ํ†ตํ•ฉ ๋‚ด๋ถ€ ๋ณ€์ˆ˜ ํ•™์Šต ๋ฐ ์‹ฌ๋ณผ๋ฆญ ํšŒ๊ท€ ํ”„๋ ˆ์ž„์›Œํฌ

Hysteresis is a nonlinear phenomenon with memory effects, where a system's output depends on both its current state and past states. It is prevalent in various physical and mechanical systems, such as yielding structures under seismic excitation, ferromagnetic materials, and piezoelectric actuators.

๋‘ ๋‹จ๊ณ„ ์ž๊ธฐ์ง€๋„ ํ•™์Šต์œผ๋กœ ๊ตฌํ˜„ํ•œ ๊ณ ํšจ์œจ ์Œ์„ฑ ํ‘œํ˜„ ๋ฐ ์••์ถ• ํ”„๋ ˆ์ž„์›Œํฌ

๋‘ ๋‹จ๊ณ„ ์ž๊ธฐ์ง€๋„ ํ•™์Šต์œผ๋กœ ๊ตฌํ˜„ํ•œ ๊ณ ํšจ์œจ ์Œ์„ฑ ํ‘œํ˜„ ๋ฐ ์••์ถ• ํ”„๋ ˆ์ž„์›Œํฌ

We introduce a two-stage self-supervised framework that combines the Joint-Embedding Predictive Architecture (JEPA) with a Density Adaptive Attention Mechanism (DAAM) for learning robust speech representations. Stage 1 uses JEPA with DAAM to learn semantic audio features via masked prediction in lat

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์˜คํ† ์ธ์ฝ”๋”์˜ ๋ฆฌํ”„์‹œ์ธ  ํŠน์„ฑ ๋ถ„์„๊ณผ ์ฃผ์˜ ๊ธฐ๋ฐ˜ ์œตํ•ฉ ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์˜คํ† ์ธ์ฝ”๋”์˜ ๋ฆฌํ”„์‹œ์ธ  ํŠน์„ฑ ๋ถ„์„๊ณผ ์ฃผ์˜ ๊ธฐ๋ฐ˜ ์œตํ•ฉ ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•

In recent years, the development of multimodal autoencoders has gained significant attention due to their potential to handle multimodal complex data types and improve model performance. Understanding the stability and robustness of these models is crucial for optimizing their training, architecture

๋ชจ๋‚˜๋”• ์ปจํ…์ŠคํŠธ ์—”์ง€๋‹ˆ์–ด๋ง: ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ์—์ด์ „ํŠธ ์„ค๊ณ„์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„

๋ชจ๋‚˜๋”• ์ปจํ…์ŠคํŠธ ์—”์ง€๋‹ˆ์–ด๋ง: ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ์—์ด์ „ํŠธ ์„ค๊ณ„์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„

The proliferation of Large Language Models (LLMs) has catalyzed a shift towards autonomous agents capable of complex reasoning and tool use. However, current agent architectures are frequently constructed using imperative, ad hoc patterns. This results in brittle systems plagued by difficulties in s

๋ฌด์„ ์ฃผํŒŒ์ˆ˜ ๋ผ๋””์–ธ์Šคํ•„๋“œ ๊ธฐ๋ฐ˜ ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ์‹ค๋‚ด ์œ„์น˜์ถ”์ • ์ผ๋ฐ˜ํ™” ํ˜์‹ 

๋ฌด์„ ์ฃผํŒŒ์ˆ˜ ๋ผ๋””์–ธ์Šคํ•„๋“œ ๊ธฐ๋ฐ˜ ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ์‹ค๋‚ด ์œ„์น˜์ถ”์ • ์ผ๋ฐ˜ํ™” ํ˜์‹ 

Radio frequency (RF)-based indoor localization offers significant promise for applications such as indoor navigation, augmented reality, and pervasive computing. While deep learning has greatly enhanced localization accuracy and robustness, existing localization models still face major challenges in

๋ณ€ํ˜• ๋ง๋ฒ  ๊ธฐ๋ฐ˜ ๊ธ€๋กœ๋ฒŒ ์ปจํ…์ŠคํŠธ ํ•™์Šต์„ ํ†ตํ•œ 3D ์† ์ž์„ธ ์ถ”์ •

๋ณ€ํ˜• ๋ง๋ฒ  ๊ธฐ๋ฐ˜ ๊ธ€๋กœ๋ฒŒ ์ปจํ…์ŠคํŠธ ํ•™์Šต์„ ํ†ตํ•œ 3D ์† ์ž์„ธ ์ถ”์ •

Modeling daily hand interactions often struggles with severe occlusions, such as when two hands overlap, which highlights the need for robust feature learning in 3D hand pose estimation (HPE). To handle such occluded hand images, it is vital to effectively learn the relationship between local image

๋ณ€ํ˜• ํŠธ๋žœ์Šคํฌ๋จธ ์ •์ฑ…์„ ์œ„ํ•œ ์ผ๋ฐ˜ํ™” ์ •์ฑ… ๊ทธ๋ž˜๋””์–ธํŠธ ์ •๋ฆฌ

๋ณ€ํ˜• ํŠธ๋žœ์Šคํฌ๋จธ ์ •์ฑ…์„ ์œ„ํ•œ ์ผ๋ฐ˜ํ™” ์ •์ฑ… ๊ทธ๋ž˜๋””์–ธํŠธ ์ •๋ฆฌ

We present the Generalized Policy Gradient (GPG) Theorem, specifically designed for Transformer-based policies. Notably, we demonstrate that both standard Policy Gradient Theorem and GRPO emerge as special cases within our GPG framework. Furthermore, we explore its practical applications in training

์Šค์ผ€์ผ๋ง ์ทจ์•ฝ์ ์„ ์ด์šฉํ•œ ์ ์‘ํ˜• ์‹œ๊ฐโ€‘์–ธ์–ด ๋ชจ๋ธ ๊ณต๊ฒฉ ํ”„๋ ˆ์ž„์›Œํฌ

์Šค์ผ€์ผ๋ง ์ทจ์•ฝ์ ์„ ์ด์šฉํ•œ ์ ์‘ํ˜• ์‹œ๊ฐโ€‘์–ธ์–ด ๋ชจ๋ธ ๊ณต๊ฒฉ ํ”„๋ ˆ์ž„์›Œํฌ

Multimodal Artificial Intelligence (AI) systems, particularly Vision-Language Models (VLMs), have become integral to critical applications ranging from autonomous decisionmaking to automated document processing. As these systems scale, they rely heavily on preprocessing pipelines to handle diverse i

์‹œ๊ฐโ€‘์–ธ์–ด ๋ชจ๋ธ ํ…์ŠคํŠธ ๊ด€์„ฑ ํ•ด์†Œ๋ฅผ ์œ„ํ•œ ์˜์‹์  ์‹œ์„  ์ œ์–ด

์‹œ๊ฐโ€‘์–ธ์–ด ๋ชจ๋ธ ํ…์ŠคํŠธ ๊ด€์„ฑ ํ•ด์†Œ๋ฅผ ์œ„ํ•œ ์˜์‹์  ์‹œ์„  ์ œ์–ด

Large Vision-Language Models (VLMs) often exhibit text inertia, where attention drifts from visual evidence toward linguistic priors, resulting in object hallucinations. Existing decoding strategies intervene only at the output logits and thus cannot correct internal reasoning drift, while recent in

์‹œ์  ๋ณ€ํ™”์™€ ์›€์ง์ด๋Š” ์Œ์›์— ๋Œ€์‘ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ๋ฐ”์ด๋…ธ๋Ÿด ์˜ค๋””์˜ค ViSAudio

์‹œ์  ๋ณ€ํ™”์™€ ์›€์ง์ด๋Š” ์Œ์›์— ๋Œ€์‘ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ๋ฐ”์ด๋…ธ๋Ÿด ์˜ค๋””์˜ค ViSAudio

Comprehensive experiments demonstrate that ViSAudio outperforms existing state-of-the-art methods across both objective metrics and subjective evaluations, generating high-quality binaural audio with spatial immersion that adapts effectively to viewpoint changes, sound-source motion, and diverse aco

์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์œ„ํ•œ 4D ๊ฐ€์šฐ์‹œ์•ˆ ์Šคํ”Œ๋ž˜ํŒ… ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ AirGS

์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์œ„ํ•œ 4D ๊ฐ€์šฐ์‹œ์•ˆ ์Šคํ”Œ๋ž˜ํŒ… ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ AirGS

Free-viewpoint video (FVV) enables immersive viewing experiences by allowing users to view scenes from arbitrary perspectives. As a prominent reconstruction technique for FVV generation, 4D Gaussian Splatting (4DGS) models dynamic scenes with time-varying 3D Gaussian ellipsoids and achieves high-qua

์‹ฌ๋ณผ๋ฆญ ๋“œ๋ผ์ด๋ธŒ ๋กœ์ปฌ ํผ์ŠคํŠธ ์ž์œจ์ฃผํ–‰ ๋ฐ์ดํ„ฐ ๋งˆ์ด๋‹ ํ”„๋ ˆ์ž„์›Œํฌ

์‹ฌ๋ณผ๋ฆญ ๋“œ๋ผ์ด๋ธŒ ๋กœ์ปฌ ํผ์ŠคํŠธ ์ž์œจ์ฃผํ–‰ ๋ฐ์ดํ„ฐ ๋งˆ์ด๋‹ ํ”„๋ ˆ์ž„์›Œํฌ

The development of robust Autonomous Vehicles (AVs) is bottlenecked by the scarcity of 'Long-Tail' training data. While fleets collect petabytes of video logs, identifying rare safety-critical events (e.g., erratic jaywalking, construction diversions) remains a manual, cost-prohibitive process. Exis

์—”ํŠธ๋กœํ”ผ ์‹ ํ˜ธ ๊ธฐ๋ฐ˜ ํšจ์œจ์  ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ์ถ”๋ก  ํ–ฅ์ƒ

์—”ํŠธ๋กœํ”ผ ์‹ ํ˜ธ ๊ธฐ๋ฐ˜ ํšจ์œจ์  ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ์ถ”๋ก  ํ–ฅ์ƒ

Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-oriented learning paradigm often suffers from entropy collapse, which reduces policy exploration and limits reason

์›น์‰˜ ํŒจ๋ฐ€๋ฆฌ ์ž๋™ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ๋™์  ํ˜ธ์ถœ ์ถ”์ ๊ณผ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ํ‘œํ˜„ ์—ฐ๊ตฌ

์›น์‰˜ ํŒจ๋ฐ€๋ฆฌ ์ž๋™ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ๋™์  ํ˜ธ์ถœ ์ถ”์ ๊ณผ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ํ‘œํ˜„ ์—ฐ๊ตฌ

Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has made significant progress in WebShell detection (i.e., distinguishing malicious samp

์œ„ํ‚ค๋ฐฑ๊ณผ ๋Œ“๊ธ€ ๋ฌด๋ก€์„ฑ ํƒ์ง€๋ฅผ ์œ„ํ•œ ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ์  ๋ถ„์„

์œ„ํ‚ค๋ฐฑ๊ณผ ๋Œ“๊ธ€ ๋ฌด๋ก€์„ฑ ํƒ์ง€๋ฅผ ์œ„ํ•œ ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ์  ๋ถ„์„

Online incivility has emerged as a widespread and persistent problem in digital communities, imposing substantial social and psychological burdens on users. Although many platforms attempt to curb incivility through moderation and automated detection, the performance of existing approaches often rem

์ด์ค‘ ์ถ”๋ก  ํ•™์Šต: ๊ธ์ •โ€‘๋ถ€์ • ๋…ผ๋ฆฌ๋ฅผ ๊ฒฐํ•ฉํ•œ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณผํ•™์  ์ถ”๋ก  ๊ฐ•ํ™”

์ด์ค‘ ์ถ”๋ก  ํ•™์Šต: ๊ธ์ •โ€‘๋ถ€์ • ๋…ผ๋ฆฌ๋ฅผ ๊ฒฐํ•ฉํ•œ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณผํ•™์  ์ถ”๋ก  ๊ฐ•ํ™”

Large Language Models (LLMs) have transformed natural language processing and hold growing promise for advancing science, healthcare, and decision-making. Yet their training paradigms remain dominated by affirmation-based inference, akin to modus ponens, where accepted premises yield predicted conse

์ž๋™ํ™”๋œ MDP ๋ชจ๋ธ๋ง๊ณผ ์ •์ฑ… ์ƒ์„ฑ์„ ์œ„ํ•œ ์—์ด์ „ํŠธํ˜• LLM ํ”„๋ ˆ์ž„์›Œํฌ Aโ€‘LAMP

์ž๋™ํ™”๋œ MDP ๋ชจ๋ธ๋ง๊ณผ ์ •์ฑ… ์ƒ์„ฑ์„ ์œ„ํ•œ ์—์ด์ „ํŠธํ˜• LLM ํ”„๋ ˆ์ž„์›Œํฌ Aโ€‘LAMP

Applying reinforcement learning (RL) to real-world tasks requires converting informal descriptions into a formal Markov decision process (MDP), implementing an executable environment, and training a policy agent. Automating this process is challenging due to modeling errors, fragile code, and misali

ํ๋ธŒ๋ฒค์น˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณต๊ฐ„ยท์ˆœ์ฐจ ์ถ”๋ก  ํ‰๊ฐ€

ํ๋ธŒ๋ฒค์น˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณต๊ฐ„ยท์ˆœ์ฐจ ์ถ”๋ก  ํ‰๊ฐ€

We introduce Cube Bench, a Rubik's-cube benchmark for evaluating spatial and sequential reasoning in multimodal large language models (MLLMs). The benchmark decomposes performance into five skills: (i) reconstructing cube faces from images and text, (ii) choosing the optimal next move, (iii) predict

ํ”Œ๋ผ์Šคํ‹ฑ์„ฑ ํšŒ๋ณต์„ ์œ„ํ•œ ํŠธ์œˆ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ๋ฆฌ์…‹ ๊ธฐ๋ฒ• AltNet

ํ”Œ๋ผ์Šคํ‹ฑ์„ฑ ํšŒ๋ณต์„ ์œ„ํ•œ ํŠธ์œˆ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ๋ฆฌ์…‹ ๊ธฐ๋ฒ• AltNet

Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning a

ํ”ฝ์…€ ๋™๋“ฑ ์ž ์žฌ ํ•ฉ์„ฑ์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€ ์ธํŽ˜์ธํŒ…

ํ”ฝ์…€ ๋™๋“ฑ ์ž ์žฌ ํ•ฉ์„ฑ์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€ ์ธํŽ˜์ธํŒ…

Latent inpainting in diffusion models still relies almost universally on linearly interpolating VAE latents under a downsampled mask. We propose a key principle for compositing image latents: Pixel-Equivalent Latent Compositing (PELC). An equivalent latent compositor should be the same as compositin

< Category Statistics (Total: 5005) >

General Relativity
59
General Research
699
HEP-EX
14
HEP-LAT
8
HEP-PH
63
HEP-TH
68
MATH-PH
82
NUCL-EX
5
NUCL-TH
15
Quantum Physics
57

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut