TUTORIAL SERIES: Polyglot Data with Python: Introducing Pandas and Apache Arrow by Robson Junior

Nowadays Python is synonymous of data, but not necessarily the best choice for some data tasks. For example, exchange data between different ecosystems is one of the challenges for Python. Pandas and NumPy are very efficient and de facto tools to deal with a reasonable amount of data with performance, but they are limited outside of the Python ecosystem. Acquire and exchange data might be painful due to the problem to write slow conversion code or generated unnecessary large files to talk with other ecosystems, likes large CSV files. Apache Arrow playing with Pandas is a great option as technologies that handle these problems with an excellent performance playing natively with Python.

In this talk, Robson Junior aims to show you how to work in a heterogeneous environment with data coming from another ecosystem, be handled inside the Python ecosystem and send back to another ecosystem transparently.

Robson is a developer deeply involved with software communities, especially the Python community. He has been organizing conferences and meetups since 2011 and effectively speaking in conferences since 2012 about python and cloud technologies and since 2016 about data-related technologies. Also as an Independent consultant, he conducts on-demand architecture consultancy and training sessions about data-related technologies.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s