General Research

All posts under category "General Research"

786 posts total
Sorted by date
๊ทน์ง€ ํ•ด๋น™ ๊ฐ์†Œ์™€ ํŠธ๋žœ์Šค์•„ํฌํ‹ฑ ํ•ญ๋กœ ๊ฐ€๋Šฅ์„ฑ ํ‰๊ฐ€ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ์˜คํ”„๋ผ์ธ ๊ฒฝ๋กœ ํƒ์ƒ‰

๊ทน์ง€ ํ•ด๋น™ ๊ฐ์†Œ์™€ ํŠธ๋žœ์Šค์•„ํฌํ‹ฑ ํ•ญ๋กœ ๊ฐ€๋Šฅ์„ฑ ํ‰๊ฐ€ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ์˜คํ”„๋ผ์ธ ๊ฒฝ๋กœ ํƒ์ƒ‰

Climate-driven reductions in Arctic sea-ice extent have renewed interest in trans-Arctic shipping, yet adoption remains limited by questions of route feasibility, safety, and excess distance. Existing work often compares idealised great-circle shortcuts or uses detailed weather-routing systems that

๋ชจ๋ฐ”์ผ GUI ์—์ด์ „ํŠธ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋ชจ๋“ˆํ˜• ๋‹ค๊ฒฝ๋กœ ์˜คํ”„๋ผ์ธ ๋ฒค์น˜๋งˆํฌ MobiBench

๋ชจ๋ฐ”์ผ GUI ์—์ด์ „ํŠธ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋ชจ๋“ˆํ˜• ๋‹ค๊ฒฝ๋กœ ์˜คํ”„๋ผ์ธ ๋ฒค์น˜๋งˆํฌ MobiBench

Mobile GUI Agents-AI agents capable of interacting with mobile applications on behalf of users-have the potential to transform human-computer interaction. However, current evaluation practices for GUI agents face two fundamental limitations. First, they either rely on single-path offline benchmarks

์‹œ๊ฐ ์ฆ๊ฐ• ์‚ฌ์œ  ์‚ฌ์Šฌ: ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ ๋™์  ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์œผ๋กœ VLM ๊ฒฌ๊ณ ์„ฑ ๊ฐ•ํ™”

์‹œ๊ฐ ์ฆ๊ฐ• ์‚ฌ์œ  ์‚ฌ์Šฌ: ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ ๋™์  ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์œผ๋กœ VLM ๊ฒฌ๊ณ ์„ฑ ๊ฐ•ํ™”

While visual data augmentation remains a cornerstone for training robust vision models, it has received limited attention in visual language models (VLMs), which predominantly rely on large-scale real data acquisition or synthetic diversity. Consequently, they may struggle with basic perception task

์‹œ์ ๋ณ„ ์‹œ๊ฐ ์ „๋ฌธ๊ฐ€์™€ ์ž๊ธฐ์ง€๋„ ์œตํ•ฉ ๊ธฐ๋ฐ˜ ๋ฌด๋…ธ์ด์ฆˆ ๊ธฐํ•˜ํ•™์  ์‚ฌ์ „

์‹œ์ ๋ณ„ ์‹œ๊ฐ ์ „๋ฌธ๊ฐ€์™€ ์ž๊ธฐ์ง€๋„ ์œตํ•ฉ ๊ธฐ๋ฐ˜ ๋ฌด๋…ธ์ด์ฆˆ ๊ธฐํ•˜ํ•™์  ์‚ฌ์ „

fusion network as an ensemble of timestep-dependent visual experts and self-supervisedly aggregates their heterogeneous priors into a single, clean, and complete geometric prior. Meanwhile, we utilize task-specific supervision to seamlessly adapt this noise-free prior to dense prediction tasks. Exte

AI ์„ค๋“ ๊ธฐ์ˆ ์ด ๋ฏผ์ฃผ์ฃผ์˜ ์—˜๋ฆฌํŠธ์˜ ์ •์ฑ… ํŽธํ–ฅ๊ณผ ์–‘๊ทนํ™” ์ „๋žต์„ ์žฌ๊ตฌ์„ฑํ•œ๋‹ค

AI ์„ค๋“ ๊ธฐ์ˆ ์ด ๋ฏผ์ฃผ์ฃผ์˜ ์—˜๋ฆฌํŠธ์˜ ์ •์ฑ… ํŽธํ–ฅ๊ณผ ์–‘๊ทนํ™” ์ „๋žต์„ ์žฌ๊ตฌ์„ฑํ•œ๋‹ค

In democracies, major policy decisions typically require some form of majority or consensus, so elites must secure mass support to govern. Historically, elites could shape support only through limited instruments like schooling and mass media; advances in AI-driven persuasion sharply reduce the cost

๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด์™€ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ ๋ฐ ์ด๋ฒคํŠธ ํŠธ๋ฆฌ๊ฑฐ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฒฐํ•ฉํ•œ ํ†ตํ•ฉ ์ œ์–ด ๊ตฌ์กฐ

๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด์™€ ์™ธ๋ž€ ๊ด€์ธก๊ธฐ ๋ฐ ์ด๋ฒคํŠธ ํŠธ๋ฆฌ๊ฑฐ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฒฐํ•ฉํ•œ ํ†ตํ•ฉ ์ œ์–ด ๊ตฌ์กฐ

This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system

๊ตํ™˜ํ˜• ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฐ€์šฐ์‹œ์•ˆ ๊ธฐ๋ฐ˜ ๊ณ ์ •๋ฐ€ยท๊ณ ํ’ˆ์งˆ 3D ์žฌ๊ตฌ์„ฑ

๊ตํ™˜ํ˜• ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฐ€์šฐ์‹œ์•ˆ ๊ธฐ๋ฐ˜ ๊ณ ์ •๋ฐ€ยท๊ณ ํ’ˆ์งˆ 3D ์žฌ๊ตฌ์„ฑ

Figure 1 : Comparison of 3DGS, 2DGS, and our EGGS. While 3DGS achieves high-fidelity appearance, it often produces inaccurate geometry, with imprecise surfaces and blurred edges. 2DGS improves geometric consistency across views but suffers from reduced appearance quality due to over-smoothed surfac

๋‹จ๋ฐฑ์งˆ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋น ๋ฅธ ์ง€๋„ํ•™์Šต์œผ๋กœ ํšจ์œจ์ ์ธ ๋‹จ๋ฐฑ์งˆ ์„ค๊ณ„์™€ ํ˜์‹ ์  ์„œ์—ด ํƒ์ƒ‰

๋‹จ๋ฐฑ์งˆ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋น ๋ฅธ ์ง€๋„ํ•™์Šต์œผ๋กœ ํšจ์œจ์ ์ธ ๋‹จ๋ฐฑ์งˆ ์„ค๊ณ„์™€ ํ˜์‹ ์  ์„œ์—ด ํƒ์ƒ‰

Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains, yet its application to protein sequence modeling and protein language models (PLMs) remains ad hoc. This is in part because highquality annotated data are far more difficult to obtain for p

๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํžˆ์Šคํ…Œ๋ฆฌ์‹œ์Šค ๋ชจ๋ธ ์ž๋™ ์ถ”์ถœ์„ ์œ„ํ•œ ํ†ตํ•ฉ ๋‚ด๋ถ€ ๋ณ€์ˆ˜ ํ•™์Šต ๋ฐ ์‹ฌ๋ณผ๋ฆญ ํšŒ๊ท€ ํ”„๋ ˆ์ž„์›Œํฌ

๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํžˆ์Šคํ…Œ๋ฆฌ์‹œ์Šค ๋ชจ๋ธ ์ž๋™ ์ถ”์ถœ์„ ์œ„ํ•œ ํ†ตํ•ฉ ๋‚ด๋ถ€ ๋ณ€์ˆ˜ ํ•™์Šต ๋ฐ ์‹ฌ๋ณผ๋ฆญ ํšŒ๊ท€ ํ”„๋ ˆ์ž„์›Œํฌ

Hysteresis is a nonlinear phenomenon with memory effects, where a system's output depends on both its current state and past states. It is prevalent in various physical and mechanical systems, such as yielding structures under seismic excitation, ferromagnetic materials, and piezoelectric actuators.

๋‘ ๋‹จ๊ณ„ ์ž๊ธฐ์ง€๋„ ํ•™์Šต์œผ๋กœ ๊ตฌํ˜„ํ•œ ๊ณ ํšจ์œจ ์Œ์„ฑ ํ‘œํ˜„ ๋ฐ ์••์ถ• ํ”„๋ ˆ์ž„์›Œํฌ

๋‘ ๋‹จ๊ณ„ ์ž๊ธฐ์ง€๋„ ํ•™์Šต์œผ๋กœ ๊ตฌํ˜„ํ•œ ๊ณ ํšจ์œจ ์Œ์„ฑ ํ‘œํ˜„ ๋ฐ ์••์ถ• ํ”„๋ ˆ์ž„์›Œํฌ

We introduce a two-stage self-supervised framework that combines the Joint-Embedding Predictive Architecture (JEPA) with a Density Adaptive Attention Mechanism (DAAM) for learning robust speech representations. Stage 1 uses JEPA with DAAM to learn semantic audio features via masked prediction in lat

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์˜คํ† ์ธ์ฝ”๋”์˜ ๋ฆฌํ”„์‹œ์ธ  ํŠน์„ฑ ๋ถ„์„๊ณผ ์ฃผ์˜ ๊ธฐ๋ฐ˜ ์œตํ•ฉ ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์˜คํ† ์ธ์ฝ”๋”์˜ ๋ฆฌํ”„์‹œ์ธ  ํŠน์„ฑ ๋ถ„์„๊ณผ ์ฃผ์˜ ๊ธฐ๋ฐ˜ ์œตํ•ฉ ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•

In recent years, the development of multimodal autoencoders has gained significant attention due to their potential to handle multimodal complex data types and improve model performance. Understanding the stability and robustness of these models is crucial for optimizing their training, architecture

๋ชจ๋‚˜๋”• ์ปจํ…์ŠคํŠธ ์—”์ง€๋‹ˆ์–ด๋ง: ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ์—์ด์ „ํŠธ ์„ค๊ณ„์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„

๋ชจ๋‚˜๋”• ์ปจํ…์ŠคํŠธ ์—”์ง€๋‹ˆ์–ด๋ง: ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ์—์ด์ „ํŠธ ์„ค๊ณ„์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„

The proliferation of Large Language Models (LLMs) has catalyzed a shift towards autonomous agents capable of complex reasoning and tool use. However, current agent architectures are frequently constructed using imperative, ad hoc patterns. This results in brittle systems plagued by difficulties in s

๋ฌด์„ ์ฃผํŒŒ์ˆ˜ ๋ผ๋””์–ธ์Šคํ•„๋“œ ๊ธฐ๋ฐ˜ ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ์‹ค๋‚ด ์œ„์น˜์ถ”์ • ์ผ๋ฐ˜ํ™” ํ˜์‹ 

๋ฌด์„ ์ฃผํŒŒ์ˆ˜ ๋ผ๋””์–ธ์Šคํ•„๋“œ ๊ธฐ๋ฐ˜ ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ์‹ค๋‚ด ์œ„์น˜์ถ”์ • ์ผ๋ฐ˜ํ™” ํ˜์‹ 

Radio frequency (RF)-based indoor localization offers significant promise for applications such as indoor navigation, augmented reality, and pervasive computing. While deep learning has greatly enhanced localization accuracy and robustness, existing localization models still face major challenges in

๋ณ€ํ˜• ๋ง๋ฒ  ๊ธฐ๋ฐ˜ ๊ธ€๋กœ๋ฒŒ ์ปจํ…์ŠคํŠธ ํ•™์Šต์„ ํ†ตํ•œ 3D ์† ์ž์„ธ ์ถ”์ •

๋ณ€ํ˜• ๋ง๋ฒ  ๊ธฐ๋ฐ˜ ๊ธ€๋กœ๋ฒŒ ์ปจํ…์ŠคํŠธ ํ•™์Šต์„ ํ†ตํ•œ 3D ์† ์ž์„ธ ์ถ”์ •

Modeling daily hand interactions often struggles with severe occlusions, such as when two hands overlap, which highlights the need for robust feature learning in 3D hand pose estimation (HPE). To handle such occluded hand images, it is vital to effectively learn the relationship between local image

๋ณ€ํ˜• ํŠธ๋žœ์Šคํฌ๋จธ ์ •์ฑ…์„ ์œ„ํ•œ ์ผ๋ฐ˜ํ™” ์ •์ฑ… ๊ทธ๋ž˜๋””์–ธํŠธ ์ •๋ฆฌ

๋ณ€ํ˜• ํŠธ๋žœ์Šคํฌ๋จธ ์ •์ฑ…์„ ์œ„ํ•œ ์ผ๋ฐ˜ํ™” ์ •์ฑ… ๊ทธ๋ž˜๋””์–ธํŠธ ์ •๋ฆฌ

We present the Generalized Policy Gradient (GPG) Theorem, specifically designed for Transformer-based policies. Notably, we demonstrate that both standard Policy Gradient Theorem and GRPO emerge as special cases within our GPG framework. Furthermore, we explore its practical applications in training

์Šค์ผ€์ผ๋ง ์ทจ์•ฝ์ ์„ ์ด์šฉํ•œ ์ ์‘ํ˜• ์‹œ๊ฐโ€‘์–ธ์–ด ๋ชจ๋ธ ๊ณต๊ฒฉ ํ”„๋ ˆ์ž„์›Œํฌ

์Šค์ผ€์ผ๋ง ์ทจ์•ฝ์ ์„ ์ด์šฉํ•œ ์ ์‘ํ˜• ์‹œ๊ฐโ€‘์–ธ์–ด ๋ชจ๋ธ ๊ณต๊ฒฉ ํ”„๋ ˆ์ž„์›Œํฌ

Multimodal Artificial Intelligence (AI) systems, particularly Vision-Language Models (VLMs), have become integral to critical applications ranging from autonomous decisionmaking to automated document processing. As these systems scale, they rely heavily on preprocessing pipelines to handle diverse i

์‹œ๊ฐโ€‘์–ธ์–ด ๋ชจ๋ธ ํ…์ŠคํŠธ ๊ด€์„ฑ ํ•ด์†Œ๋ฅผ ์œ„ํ•œ ์˜์‹์  ์‹œ์„  ์ œ์–ด

์‹œ๊ฐโ€‘์–ธ์–ด ๋ชจ๋ธ ํ…์ŠคํŠธ ๊ด€์„ฑ ํ•ด์†Œ๋ฅผ ์œ„ํ•œ ์˜์‹์  ์‹œ์„  ์ œ์–ด

Large Vision-Language Models (VLMs) often exhibit text inertia, where attention drifts from visual evidence toward linguistic priors, resulting in object hallucinations. Existing decoding strategies intervene only at the output logits and thus cannot correct internal reasoning drift, while recent in

์‹œ์  ๋ณ€ํ™”์™€ ์›€์ง์ด๋Š” ์Œ์›์— ๋Œ€์‘ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ๋ฐ”์ด๋…ธ๋Ÿด ์˜ค๋””์˜ค ViSAudio

์‹œ์  ๋ณ€ํ™”์™€ ์›€์ง์ด๋Š” ์Œ์›์— ๋Œ€์‘ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ๋ฐ”์ด๋…ธ๋Ÿด ์˜ค๋””์˜ค ViSAudio

Comprehensive experiments demonstrate that ViSAudio outperforms existing state-of-the-art methods across both objective metrics and subjective evaluations, generating high-quality binaural audio with spatial immersion that adapts effectively to viewpoint changes, sound-source motion, and diverse aco

์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์œ„ํ•œ 4D ๊ฐ€์šฐ์‹œ์•ˆ ์Šคํ”Œ๋ž˜ํŒ… ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ AirGS

์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์œ„ํ•œ 4D ๊ฐ€์šฐ์‹œ์•ˆ ์Šคํ”Œ๋ž˜ํŒ… ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ AirGS

Free-viewpoint video (FVV) enables immersive viewing experiences by allowing users to view scenes from arbitrary perspectives. As a prominent reconstruction technique for FVV generation, 4D Gaussian Splatting (4DGS) models dynamic scenes with time-varying 3D Gaussian ellipsoids and achieves high-qua

์—”ํŠธ๋กœํ”ผ ์‹ ํ˜ธ ๊ธฐ๋ฐ˜ ํšจ์œจ์  ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ์ถ”๋ก  ํ–ฅ์ƒ

์—”ํŠธ๋กœํ”ผ ์‹ ํ˜ธ ๊ธฐ๋ฐ˜ ํšจ์œจ์  ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ ์ถ”๋ก  ํ–ฅ์ƒ

Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-oriented learning paradigm often suffers from entropy collapse, which reduces policy exploration and limits reason

์›น์‰˜ ํŒจ๋ฐ€๋ฆฌ ์ž๋™ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ๋™์  ํ˜ธ์ถœ ์ถ”์ ๊ณผ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ํ‘œํ˜„ ์—ฐ๊ตฌ

์›น์‰˜ ํŒจ๋ฐ€๋ฆฌ ์ž๋™ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ๋™์  ํ˜ธ์ถœ ์ถ”์ ๊ณผ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ํ‘œํ˜„ ์—ฐ๊ตฌ

Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has made significant progress in WebShell detection (i.e., distinguishing malicious samp

์œ„ํ‚ค๋ฐฑ๊ณผ ๋Œ“๊ธ€ ๋ฌด๋ก€์„ฑ ํƒ์ง€๋ฅผ ์œ„ํ•œ ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ์  ๋ถ„์„

์œ„ํ‚ค๋ฐฑ๊ณผ ๋Œ“๊ธ€ ๋ฌด๋ก€์„ฑ ํƒ์ง€๋ฅผ ์œ„ํ•œ ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ์  ๋ถ„์„

Online incivility has emerged as a widespread and persistent problem in digital communities, imposing substantial social and psychological burdens on users. Although many platforms attempt to curb incivility through moderation and automated detection, the performance of existing approaches often rem

์ด์ค‘ ์ถ”๋ก  ํ•™์Šต: ๊ธ์ •โ€‘๋ถ€์ • ๋…ผ๋ฆฌ๋ฅผ ๊ฒฐํ•ฉํ•œ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณผํ•™์  ์ถ”๋ก  ๊ฐ•ํ™”

์ด์ค‘ ์ถ”๋ก  ํ•™์Šต: ๊ธ์ •โ€‘๋ถ€์ • ๋…ผ๋ฆฌ๋ฅผ ๊ฒฐํ•ฉํ•œ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณผํ•™์  ์ถ”๋ก  ๊ฐ•ํ™”

Large Language Models (LLMs) have transformed natural language processing and hold growing promise for advancing science, healthcare, and decision-making. Yet their training paradigms remain dominated by affirmation-based inference, akin to modus ponens, where accepted premises yield predicted conse

์ž๋™ํ™”๋œ MDP ๋ชจ๋ธ๋ง๊ณผ ์ •์ฑ… ์ƒ์„ฑ์„ ์œ„ํ•œ ์—์ด์ „ํŠธํ˜• LLM ํ”„๋ ˆ์ž„์›Œํฌ Aโ€‘LAMP

์ž๋™ํ™”๋œ MDP ๋ชจ๋ธ๋ง๊ณผ ์ •์ฑ… ์ƒ์„ฑ์„ ์œ„ํ•œ ์—์ด์ „ํŠธํ˜• LLM ํ”„๋ ˆ์ž„์›Œํฌ Aโ€‘LAMP

Applying reinforcement learning (RL) to real-world tasks requires converting informal descriptions into a formal Markov decision process (MDP), implementing an executable environment, and training a policy agent. Automating this process is challenging due to modeling errors, fragile code, and misali

ํ๋ธŒ๋ฒค์น˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณต๊ฐ„ยท์ˆœ์ฐจ ์ถ”๋ก  ํ‰๊ฐ€

ํ๋ธŒ๋ฒค์น˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณต๊ฐ„ยท์ˆœ์ฐจ ์ถ”๋ก  ํ‰๊ฐ€

We introduce Cube Bench, a Rubik's-cube benchmark for evaluating spatial and sequential reasoning in multimodal large language models (MLLMs). The benchmark decomposes performance into five skills: (i) reconstructing cube faces from images and text, (ii) choosing the optimal next move, (iii) predict

ํ”Œ๋ผ์Šคํ‹ฑ์„ฑ ํšŒ๋ณต์„ ์œ„ํ•œ ํŠธ์œˆ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ๋ฆฌ์…‹ ๊ธฐ๋ฒ• AltNet

ํ”Œ๋ผ์Šคํ‹ฑ์„ฑ ํšŒ๋ณต์„ ์œ„ํ•œ ํŠธ์œˆ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ๋ฆฌ์…‹ ๊ธฐ๋ฒ• AltNet

Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning a

ํ”ฝ์…€ ๋™๋“ฑ ์ž ์žฌ ํ•ฉ์„ฑ์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€ ์ธํŽ˜์ธํŒ…

ํ”ฝ์…€ ๋™๋“ฑ ์ž ์žฌ ํ•ฉ์„ฑ์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€ ์ธํŽ˜์ธํŒ…

Latent inpainting in diffusion models still relies almost universally on linearly interpolating VAE latents under a downsampled mask. We propose a key principle for compositing image latents: Pixel-Equivalent Latent Compositing (PELC). An equivalent latent compositor should be the same as compositin

< Category Statistics (Total: 5076) >

Electrical Engineering and Systems Science
104
General Relativity
64
General Research
786
HEP-EX
25
HEP-LAT
8
HEP-PH
82
HEP-TH
65
MATH-PH
74
NUCL-EX
6
NUCL-TH
14
Quantum Physics
65

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut