Page 1 of 3 next >>

Python’s Pervasive Portfolio

Does your business depend on Python to function on a day-to-day basis? If you said no, you should double-check. Python is everywhere. It wasn’t all that long ago that Python was only used for ad hoc purposes such as writing test harnesses for web services or the occasional data preparation tasks, where performance wasn’t a concern. Of course, it has been used for several years for web development, thanks to popular frameworks such as Django and Flask, and, more recently, within the scientific computing and data science disciplines. While all these examples paint a picture of the versatility of Python, they don’t quite convey how pervasive a language Python has become or how strong of a community and ecosystem has been built over the past 30 years.

For more articles about the future of data management in 2022, download Big Data Quarterly: Data Sourcebook (Winter 2021) Issue

A High-Level Development Language

It is important to understand how and why Python has gotten to where it is within all these different types of software solutions. Python is considered to be a high-level development language, meaning that it removes a lot of the boilerplate?—or redundant—code, making it a much more concise language. This, in turn, gives way to ease of learning.

When comparing Python to a language such as Java and its ecosystem, it is clear that Python has had a significantly different path to adoption. Java promised a “write-once-run-anywhere” solution, which was very appealing. It quickly built up a support base and a flourishing suite of enterprise applications to drive adoption. While also a higher-level language, Java still required quite a bit of boilerplate code and it was still fairly complex to learn compared to Python. This has held Java back from certain audiences. Most importantly for this discussion, those in the scientific community just wanted the language to get out of the way so they could focus on science.

As most languages have faults or shortcomings, so, too, does Python. However, these shortcomings have led to a very strong community, collectively interested in working together to improve the language’s limitations. This community is one of the strongest out there and is one of the biggest reasons for Python’s success.

Speed and Performance

Python has always been beaten up on performance—that is, how fast code executes. The importance of speed for the use case will depend on who you are, or the purpose of the code. While Python has not historically been the fastest language, its ease of use provided a trade-off that most were willing to make. This makes sense in many respects: If someone can spend a few minutes writing 10 lines of code in Python versus many more minutes to write 50 lines in Java and they are only going to run the code a few times, why not save the time writing the code and lose a little during code execution? This approach is very appealing to the scientific computing and data science communities. However, as workloads have grown, so has the amount of time for this code to execute. This left an opportunity for other languages such as  Scala with Spark; however, this language has failed to catch on due to its high level of complexity.

Given these longer runtimes, the community figured out a way to bring the Python story up a notch. Python can pack a serious performance punch when Python libraries are implemented in lower-level languages such as C++ and then, in turn, are exposed to Python users for easy consumption. The appeal of this approach is a combination of the ease of use of Python coupled with native C++ performance, which bring execution times down significantly.

Math and Science

Given that Python has a dynamic 30-year history, for the sake of brevity, let’s cut to the chase. Python absolutely dominates other languages when it comes to scientific computing and data science. We are not getting into a Fortran-type of argument here. Instead, we are looking at the ubiquity of Python as it relates to math and science problems, which, of course, are foundational disciplines for all the hottest topics in the industry such as machine and deep learning. No wonder Python is popular.

There are many people and many reasons contributing to this pervasiveness. I think one stands above the rest: Numfocus—specifically its PyData program and its community centered around data technologies in the Python ecosystem.

There are quite a few libraries in this ecosystem that are foundational to many use cases across industries, including?most notably NumPy, pandas, scikit-learn, Matplotlib, Dask, and Jupyter.

Dask provides a framework to aid in scaling out Python workloads. It isn’t the easiest thing to use for those without a deeper level of engineering experience, but it does help scaling beyond a single machine.

Page 1 of 3 next >>


Subscribe to Big Data Quarterly E-Edition