Jump to content

Polars (software)

From Wikipedia, the free encyclopedia

Polars
Original author(s)Ritchie Vink
Developer(s)Community
Repositorygithub.com/pola-rs/polars/
Written inRust
Operating systemCross-platform
TypeTechnical computing
LicenseMIT License
Websitepola.rs

Polars is an open-source software library for data manipulation. Polars is built with an OLAP query engine implemented in Rust using Apache Arrow Columnar Format as the memory model. Although built using Rust, there are Python, Node.js, R, and SQL API interfaces to use Polars.

History

[edit]

The first code to be committed was made on June 23, 2020.[1] Ritchie Vink and Chiel Peters co-founded a company to develop Polars, after working together at the company Xomnia for five years. In 2023, Vink and Peters successfully closed a seed round of approximately $4 million, which was led by Bain Capital Ventures.

Features

[edit]

The core object in Polars is the DataFrame, similar to other data processing software libraries.[2] Contexts and expressions are important concepts to Polars' syntax. A context is the specific environment in which an expression is evaluated. Meanwhile, an expression refers to computations or transformations that are performed on data columns.

Polars has three main contexts:

  • selection: choosing columns from a DataFrame
  • filtering: subset a DataFrame by keeping rows that meet specified conditions
  • group by/aggregation: calculating summary statistics within subgroups of the data

Compared to pandas

[edit]

Feature differences

[edit]

Given that Polars was designed to work on a single machine, this prompts many comparisons with the similar data manipulation software, pandas.[3] One big advantage that Polars has over pandas is performance, where Polars is 5 to 10 times faster than pandas on similar tasks. Additionally, pandas requires around 5 to 10 times as much RAM as the size of the dataset, which compares to the 2 to 4 times needed for Polars. Polars is also designed to use lazy evaluation (where a query optimizer will use the most efficient evaluation after looking at all steps) compared with pandas using eager evaluation (where steps are performed immediately). Some research on comparing pandas and Polars completing data analysis tasks show that Polars is more memory-efficient than pandas.[4]

Syntax differences

[edit]

Polars and pandas have similar syntax for reading in data using a read_csv() method, but have different syntax for calculating a rolling mean.[5]

Code using pandas:

import pandas as pd

# Read in data
df_temp = pd.read_csv(
    "temp_record.csv", index_col="date", parse_dates=True, dtype={"temp": int}
)

# Explore data
print(df_temp.dtypes)
print(df_temp.head())

# Calculate rolling mean
df_temp.rolling(2).mean()

Code using Polars:

import polars as pl

# Read in data
df_temp = pl.read_csv(
    "temp_record.csv", try_parse_dates=True, dtypes={"temp": int}
).set_sorted("date")

# Explore data
print(df_temp.dtypes)
print(df_temp.head())

# Calculate rolling average
df_temp.rolling("date", period="2d").agg(pl.mean("temp"))

See also

[edit]

References

[edit]
  1. ^ "Company announcement". www.pola.rs. Retrieved 13 May 2025.
  2. ^ Python, Real. "Python Polars: A Lightning-Fast DataFrame Library – Real Python". realpython.com. Retrieved 13 May 2025.
  3. ^ "Polars vs. pandas: What's the Difference? | The PyCharm Blog". The JetBrains Blog. 4 July 2024. Retrieved 13 May 2025.
  4. ^ Nahrstedt, Felix; Karmouche, Mehdi; Bargieł, Karolina; Banijamali, Pouyeh; Nalini Pradeep Kumar, Apoorva; Malavolta, Ivano (18 June 2024). "An Empirical Study on the Energy Usage and Performance of Pandas and Polars Data Analysis Python Libraries". Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. EASE '24. New York, NY, USA: Association for Computing Machinery. pp. 58–68. doi:10.1145/3661167.3661203. ISBN 979-8-4007-1701-7.
  5. ^ "How to Move From pandas to Polars | The PyCharm Blog". The JetBrains Blog. 19 June 2024. Retrieved 13 May 2025.

Further reading

[edit]
  • Janssens, Jeroen; Nieuwdorp, Thijs (2025). Python Polars: The Definitive Guide (1st ed.). O'Reilly. ISBN 9781098156084.
  • Narayanan, Pavan Kumar (28 September 2024). "Data Wrangling using Rust's Polars". Data Engineering for Machine Learning Pipelines. Berkeley, CA: Apress. pp. 93–131. ISBN 979-8-8688-0601-8.
[edit]