Challenges in Persian Electronic Text Analysis
📝 Original Info
- Title: Challenges in Persian Electronic Text Analysis
- ArXiv ID: 1404.4740
- Date: 2014-04-21
- Authors: Researchers from original ArXiv paper
📝 Abstract
Farsi, also known as Persian, is the official language of Iran and Tajikistan and one of the two main languages spoken in Afghanistan. Farsi enjoys a unified Arabic script as its writing system. In this paper we briefly introduce the writing standards of Farsi and highlight problems one would face when analyzing Farsi electronic texts, especially during development of Farsi corpora regarding to transcription and encoding of Farsi e-texts. The pointes mentioned may sounds easy but they are crucial when developing and processing written corpora of Farsi.💡 Deep Analysis
Deep Dive into Challenges in Persian Electronic Text Analysis.Farsi, also known as Persian, is the official language of Iran and Tajikistan and one of the two main languages spoken in Afghanistan. Farsi enjoys a unified Arabic script as its writing system. In this paper we briefly introduce the writing standards of Farsi and highlight problems one would face when analyzing Farsi electronic texts, especially during development of Farsi corpora regarding to transcription and encoding of Farsi e-texts. The pointes mentioned may sounds easy but they are crucial when developing and processing written corpora of Farsi.
📄 Full Content
Reference
This content is AI-processed based on ArXiv data.