Picking Apart Stack Overflow; What Bugs Developers The Most?
Stack Overflow has been swooping to the rescue of all types of developers since its founding in 2008. Since that time, developers have asked millions upon millions of different questions, within all areas of development.
But what are the kinds of problems developers are forced to turn to Stack Overflow for?
We picked 11 of the most popular programming languages (measured by frequency of Stack Overflow tags) and ran a study looking to uncover some of the commonalities and differences within these questions.
But before we get there, let’s take a zoomed-out look at the 11 languages we’ve selected, as shown below.
(Either Python is fast becoming the most popular programming language, or Python just has a bigger proportion of new coders compared to other languages!)
But what exactly are these developers asking about? What are the most questioned frameworks, packages, functions, and methods? Which data types cause the most pain? And how different are these problems across languages?
To do this, we:
extracted 1,000 of the most upvoted Stack Overflow questions for each of the 11 programming languages listed above.
did a bit of data cleaning in Python (pandas, naturally)
Here are the results.
However, Python is a general purpose duct-tape language and gets involved in many different domains of tech, explaining the relatively frequently questioned “django” (center-bottom) web development framework.
Perhaps the second language of choice for data scientists, R differs from Python in that it is almost exclusively used for that purpose. Data processing specific concepts such as “dataframe” (top-right), “datatable” (top-right) and “matrix” (center) seem to be causing R users some headaches.
Both Python and R have excellent data manipulation libraries, though where data visualisation is concerned, some argue R has an edge over Python. Having said this, the data visualisation library “ggplot” (center) was by far the most questioned concept in the R language.
So perhaps Python users are finding matplotlib easier to handle!
First appearing in the mid-90s, Ruby has now found a home as the server-side framework ruby-on-“rails” (top-right).
C# (C Sharp, 2000) was developed by Microsoft primarily for its .NET framework (“net”, center-right).
C++ (1985) has gone on to become the go-to-language for video game developers. The fundamental visual building block of 3D video games is the polygon, and the fundamental building block of the polygon is the “vector” (middle-right).
Java (1995) was created as a general purpose “write-once-run-anywhere” language. It became popular during the PC boom of the late 90s and the early days of the world wide web and was the driving force behind many Windows applications.
But more recently it’s found a home in “Android” (middle-right) app development.
The most ancient language of the ones in this study, Objective-C (1984) was the predominant language supported by Apple for the OSX operating system, and more recently, for “iOS” (bottom-left) apps on the “iPhone” (center)... that is, until the introduction of Swift.
First appearing in 2014, Swift has superseded Objective-C in the Apple development sphere. Though perhaps the frequency of “objective-c” mentions (middle-right) in Stack Overflow questions tagged #swift represent the thousands of iOS developers looking to Stack Overflow to update their knowledge.
PHP (1995) was designed as a server-side scripting language used for web development. It’s still used for that purpose today, and you can see evidence of this in the frequency of questions surrounding the languages “laravel” framework (center-left).
SQL isn’t a fully featured programming language like some others in this study; it’s designed specifically for one job: data manipulation. Due to this specificity, the most common pain points for SQL are all around database access: “sever”, “mysql”, “database”, “query”, “select”.
Each programming language has over time been geared toward - or was even designed for - a particular niche within tech. R is to data science as Swift is to iOS development as C++ is to video game development. This explains some of the differences in the types of problems that arise. This explains why we see “database” a commonly questioned concept in SQL but not, for example, Objective-C.
Despite these obvious differences, these visualisations represent some fundamental similarities within the different domains. Base-level data types such as strings and arrays (but not integers, floats, or boolean values, apparently) are frequent pain points that cause developers of all stripes and creeds to turn to - keyboard-under-hand - to Stack Overflow.
And in the spirit of unity, here’s a word cloud for all 11,000 of the questions we extracted:
Google can help with some questions...
...but for everything else, there’s Stack Overflow.