Cpl hosted its first Meetup on March 9th in conjunction with Dublin R and attempted to answer the question, ‘Which language should I invest my time and effort in, R or Python?’
In order to have a bit of fun with the debate the Meetup was playfully titled ‘R and Python: That Would be an Ecumenical matter’. As the Father Ted reference suggests, discussing whether R is better than Python is nearly as problematic as the data wrangling tools themselves.
The bottom line from the debate was that there is no easy answer. Each language is context specific and each has its own merits.
For those looking to pursue a career in data science, the question ‘Which language should I focus on?’ is a common one. Many of the attendees at the event had that question in mind. Both programming languages are gaining prominence in the data analytics community – with Python often featuring in the top five in several rankings. An O’Reilly Media survey in 2014 found that R is the most used tool by data scientists, followed by Python.
The case for R
R does have a reputation as the world’s most widely used language for statistical computing and predictive analytics. It was noted during the debate that R is a statistician’s language, created for and maintained by statisticians. While this may drive some people to Python, it can be a blessing for those working with stats-heavy datasets. The R community is a very active one, and often uploads programs and add-ons that are only available in R. This can be quite a useful facility for those working with the language and looking for niche tools.
Companies that are at the forefront of hiring data scientists are using R considerably – such as Google and Facebook. Microsoft recently acquired Revolution Analytics, the leading commercial provider of software and services for R. In doing so Microsoft is attempting to help more companies and users unlock the power of R in big data insights.
Python, on the other hand, brings its own advantages to the table. It is a general purpose language that can handle a myriad of tasks. Many argue that Python is easier to learn, especially if you are already familiar with Java or C++. For those familiar with SQL, Python is easy to navigate through PandaSQL.
Python’s syntax was designed to be simple and readable, which allows beginners to start writing programs immediately but it can also be scaled up for tasks that require more heavy-lifting. It is used in a number of fields from web development, to gaming, to systems automation. Disney uses Python to help power their creative process.
However, data visualisation with Python is not as attractive and can be quite static. Whereas R has ggplot2, one of the best visualisation tools around. Through ggplot2, data analysts can learn the syntax of R as well as how to think about data visualisation. Python is slowly building up its own toolkit with has the likes of Panda, Numpy, Scipy and Scikit-Learn which is making tasks that bit easier for data professionals as well as increasing its popularity.
In truth, both languages have their merits, and your ultimate choice may be dictated by your own needs or skillsets. The key point is that what language you choose means nothing unless you are familiar with the art of computer science itself. Being adept and competent in data munging, data analysis and data visualisation is much more important than the language you chose to do it in.
If you’re interested in becoming a data analyst focus on the skills of data analytics first and then let your needs and personal choice dictate which language makes the most sense for you.