R vs Python: What do Data Scientists prefer?
R and Python are the most common programming languages in the data science world, but what exactly is the difference between the two?
This remains as a common topic debated within the Data Science community. Nevertheless, both programming languages do have their own strengths and limitations in their application.
If you are a professional that is looking to start a career in this field, here are some key takeaways for both R and Python along with trends that we are seeing in the Singapore and Hong Kong markets.
History of the two programming languages
R is a statistical computing and graphics language and environment. According to R Project, it is a GNU Project – an operating system and an extensive collection of computer software – developed at Bell Laboratories. Similar to S language, R provides several options for statistical and graphic techniques.
Its functionalities include but are not limited to:
- Linear and nonlinear modelling
- Classical statistical tests
- Time-series analysis
- Classification and clustering.
Shared by R Project, R’s strengths include
- Free software which runs on a wide variety of UNIX platforms namely Linus, Windows and MacOS
- Ease with which well-designed publication-quality plots
- Making design choices in graphics where user retains full control
- Allows data manipulation calculation and graphical display
- Storing and handling data
Overall, it is a simple and effective programming language which supports data scientists and experts to create and control conditionals, loops, user-defined recursive functions and input and output facilities.
Python is a widely used, general-purpose, yet high-level programming language. Developed by Python Software Foundation, its main purpose was focusing on code readability to assist programmers to express concepts in a compressed form compared to Java, C++ and C. The objective is to provide code readability and advanced developer productivity.
Its functionalities include:
- Developing and scripting code
- Generation of code and software testing
Due to its elegance and simplicity, top technologically-driven organisations like Dropbox, Google, Quora, Mozilla, Hewlett-Packard, Qualcomm, IBM, and Cisco have implemented Python. Python is also an inspiration to the creation of many other coding languages such as Ruby, Cobra, Boo, CoffeeScript ECMAScript, Groovy, Swift Go, OCaml, Julia etc.
R vs Python: which is the preferred choice?
Dr Norm Matloff, Professor of Computer Science at University of California, wrote a paper on the key differences between the two Languages. He compared R and Python across the following multiple domains to determine which programming language was the better choice:
While this is subjective, Python greatly reduces the use of parentheses and braces when coding, making it more sleek, Matloff shared.
Winner: Python (but not by much)
Python's massive growth in recent years is partially fuelled by the rise of machine learning and artificial intelligence (AI). Python offers a number of finely-tuned libraries for image recognition.
In Maltoff’s words, the Python libraries' power comes from setting certain image-smoothing operations.
Shared by Maltaff, data scientists working with Python must learn a lot of material to get started, including NumPy, Pandas and matplotlib. Nevertheless, matrix types and basic graphics are already built into base R. Novices can now be doing simple data analyses within minutes as R packages run automatically.
Winner: R (by far)
Advocates for Python – namely professionals working within machine learning – may seem to have a poor understanding of the statistical issues involved with the language. R, on the other hand, was written by statisticians, for statisticians. This suggests that subject matter experts in R will be able to ensure that the math behind analyses are as accurate as possible.
Winner: It’s a draw
Matloff suggests that the base versions of R and Python do not have strong support for multicore computation. What he means by this is that both R’s parallel package, and Python's multiprocessing package is not a good workaround for its other issues. Nevertheless, external libraries supporting cluster computation are good in both languages, while Python has better interfaces to GPUs.
Python’s machine learning library – Scikit-learn – is deemed to be highly recognised as ‘gold-standard’. It provides a wide selection of supervised and unsupervised learning algorithms. Reported by Toward Data Science, this library, “by far the easiest and cleanest ML library”. Scikit learn was created with a software engineering mind-set. Its core API design revolves around being easy to use, yet powerful, and still maintaining flexibility for research endeavours. This robustness makes it perfect for use in any end-to-end ML project, from the research phase right down to production deployments.
What are the trends in Singapore and Hong Kong markets?
Shared by Donnie Maclary, Principal Consultant of Huxley Singapore, around 90% of all of the jobs that he is filling in Data Science and Analytics are looking for candidates that are well versed in Python. This is because Python offers a lot of flexibility as compared to R.
If you are looking to grow your career in this field, it is thus best to focus on being familiar with the full suite of Python. Additionally, other in-demand skills for data professionals include SQL, Spark, Hadoop, Java, Amazon Web Services (AWS), Scala, and Kafka.
Huxley can help!
If you are a Data Science and Analytics professional that is looking to add top-tier talent to your team, please reach out to us via the contact us page. Do keep your eyes peeled for more updates within this space on our LinkedIn page.