Title: A Study of Language Usage Evolution in Open Source Software
ArXiv ID: 1102.2262
Date: 2011-02-14
Authors: ** - Siim Karus (University of Tartu, Estonia; University of Zurich, Switzerland) - Harald Gall (University of Zurich, Switzerland) **
📝 Abstract
The use of programming languages such as Java and C in Open Source Software (OSS) has been well studied. However, many other popular languages such as XSL or XML have received minor attention. In this paper, we discuss some trends in OSS development that we observed when considering multiple programming language evolution of OSS. Based on the revision data of 22 OSS projects, we tracked the evolution of language usage and other artefacts such as documentation files, binaries and graphics files. In these systems several different languages and artefact types including C/C++, Java, XML, XSL, Makefile, Groovy, HTML, Shell scripts, CSS, Graphics files, JavaScript, JSP, Ruby, Phyton, XQuery, OpenDocument files, PHP, etc. have been used. We found that the amount of code written in different languages differs substantially. Some of our findings can be summarized as follows: (1) JavaScript and CSS files most often co-evolve with XSL; (2) Most Java developers but only every second C/C++ developer work with XML; (3) and more generally, we observed a significant increase of usage of XML and XSL during recent years and found that Java or C are hardly ever the only language used by a developer. In fact, a developer works with more than 5 different artefact types (or 4 different languages) in a project on average.
💡 Deep Analysis
📄 Full Content
A Study of Language Usage Evolution in Open Source
Software
Siim Karus
University of Tartu, Estonia
University of Zurich, Switzerland
siim.karus@ut.ee
Harald Gall
University of Zurich
Switzerland
gall@ifi.uzh.ch
ABSTRACT
The use of programming languages such as Java and C in Open
Source Software (OSS) has been well studied. However, many
other popular languages such as XSL or XML have received minor
attention. In this paper, we discuss some trends in OSS
development that we observed when considering multiple
programming language evolution of OSS. Based on the revision
data of 22 OSS projects, we tracked the evolution of language usage
and other artefacts such as documentation files, binaries and
graphics files. In these systems several different languages and
artefact types including C/C++, Java, XML, XSL, Makefile,
Groovy, HTML, Shell scripts, CSS, Graphics files, JavaScript, JSP,
Ruby, Phyton, XQuery, OpenDocument files, PHP, etc. have been
used. We found that the amount of code written in different
languages differs substantially. Some of our findings can be
summarized as follows: (1) JavaScript and CSS files most often co-
evolve with XSL; (2) Most Java developers but only every second
C/C++ developer work with XML; (3) and more generally, we
observed a significant increase of usage of XML and XSL during
recent years and found that Java or C are hardly ever the only
language used by a developer. In fact, a developer works with more
than 5 different artefact types (or 4 different languages) in a project
on average.
Categories and Subject Descriptors
D.2.7 [Software Engineering]: Distribution, Maintenance, and
Enhancement
–
Restructuring,
reverse
engineering,
and
reengineering, version control; D.3.2 [Programming Languages]:
Language Classifications – object-oriented languages, extensible
language; K.2 [Computing Milieux] History of Computing –
Software, People
General Terms
Management,
Measurement,
Documentation,
Design,
Experimentation, Human Factors, Languages.
Keywords
Programming language, Open source software, evolution, software
archives.
INTRODUCTION
There has been a lot of effort put into studying the use of procedural
languages such as C and object-oriented languages such as Java.
Even less common languages such as Perl, Python, or Ruby have
received their fair share of attention. However, when looking at the
statistics of most used languages, a language far more common than
any of the ones mentioned earlier, strikes out. According to
ohloh.net1 which tracks more than 400,000 open source software
(OSS) repositories, about 15% of actively developed OSS projects
contain XML while less than 10% contain HTML, and other
languages are present in less than 8 % of projects. Even more, XML
is also the language with the most lines of code changed per month.
The use of XML in OSS projects, however, has not received
considerable attention so far.
As XML is a mark-up language, having only little meaning on its
own, it would be interesting to understand, what other language it is
being used with. Looking at co-evolving file types, we could
investigate that issue. Even more general, the question of which
languages and file types are used together and, therefore, are co-
evolving in OSS projects can be formulated.
To address this research question, we studied 22 OSS software
repositories over 12 years. Our study focused on two levels of file
type couplings: developer and commit level. On the developer level,
developers in the projects were studied regarding their language
experience in the projects. For that, we addressed the following
questions:
Which languages and artefacts are commonly used in OSS
development and in what proportions?
How many file types does a developer typically work with
and are there some usage patterns for file types?
How has the language usage and, as a consequence, the
language expertise requirements for developers changed
during the observation period?
At the commit level, co-changing files appearing together in
commits were studied. For that, we addressed the following
questions:
Which co-evolution patterns can be observed in OSS
projects (e.g., are there distinct dependencies between
languages or artefact types commonly edited together)?
How have the dependencies between file types used in the
projects changed during the observation period?
Additionally, on a more general level of OSS projects studied, we
were interested in what are the most common languages or artefact
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republ