Why I Chose PostgreSQL Over MySQL - A Personal Journey

In today’s digital age, where data drives decisions and shapes outcomes, the choice of a Relational Database Management System (RDBMS) can significantly influence the trajectory of a project. Among the myriad options available, PostgreSQL and MySQL frequently emerge as the frontrunners. Both have etched their mark in the annals of database history, serving as the backbone for countless applications, from fledgling startups to tech behemoths. Their widespread adoption is a testament to their robustness, versatility, and adaptability. But as any seasoned developer will tell you, the devil is in the details. While both are stellar choices in their own right, nuances and subtle differences can make one more suitable than the other for specific scenarios. This blog is a reflection of my journey through these intricacies, and an exploration into why, after weighing the pros and cons, I found myself gravitating towards PostgreSQL.

MYSQL:

To set the stage, let’s first appreciate the legacy of these two giants. MySQL, birthed in the mid-90s, quickly gained traction as a reliable and efficient RDBMS, especially for web-based applications. Its ease of use, coupled with a free and open-source model, made it a favorite among startups and established companies alike. Over the years, MySQL’s ecosystem has grown exponentially, with a plethora of tools, libraries, and community contributions enhancing its capabilities.

PostgreSQL:

On the other hand, PostgreSQL, often dubbed the world’s most advanced open-source database, traces its roots back to the 80s. What sets it apart is not just its longevity but its unwavering commitment to SQL standards and its ability to introduce cutting-edge features often ahead of its peers. Its extensibility, allowing users to define their data types, operators, and more, has made it a preferred choice for those looking to push the boundaries of what’s possible with relational databases.

Given this backdrop, one might wonder: with both databases offering so much, how does one even begin to choose? That’s a question I wrestled with as I embarked on a new project. On the surface, both seemed equally capable. Both had impressive benchmarks, a vast community of users, and a proven track record. But as I delved deeper, I realized that my specific needs and aspirations for the project would be better served by PostgreSQL. It wasn’t a decision made lightly or hastily. It was the culmination of extensive research, testing, and, most importantly, understanding the unique requirements of my project.

It’s worth noting that this isn’t a one-size-fits-all verdict. There are scenarios where MySQL might outshine PostgreSQL, and vice versa. The beauty of technology lies in its diversity and the plethora of options it offers. But every choice comes with its rationale, and in the subsequent sections of this blog, I aim to shed light on the factors that swayed my decision.

In sharing this journey, my hope is to provide insights that might help others standing at the same crossroads. Whether you’re a seasoned developer, a startup founder, or someone merely curious about databases, I invite you to join me as I unravel the reasons behind my choice, hoping it offers a fresh perspective on this age-old debate.

The Factors That Tipped the Scale

1. Advanced Data Types and Flexibility:
One of the first things that caught my attention was PostgreSQL’s support for a wide variety of data types, especially JSON and JSONB. In an era where applications often interact with web services and APIs, having native support for JSON data is invaluable. This not only streamlined my development process but also allowed for more efficient querying and data manipulation. Additionally, the ability to define custom data types meant I could tailor the database to the unique needs of my project.

Scenario: Imagine you’re building a web application that interacts with various third-party services, collecting user activity data. This data is often returned in JSON format. Instead of breaking the JSON into multiple relational tables or storing it as a plain text string, you want to store, query, and manipulate this JSON data directly in your database.

Using PostgreSQL’s JSON and JSONB:

A. Storing JSON Data

In PostgreSQL, you can create a table with a JSON or JSONB column:

CREATE TABLE user_activity (
    id SERIAL PRIMARY KEY,
    activity_data JSONB
);

B. Inserting JSON Data

Imagine you received the following JSON data from a third-party service:

{
    "user_id": "12345",
    "activity": "login",
    "timestamp": "2023-08-12T12:34:56Z"
}

You can insert this directly into your table:

INSERT INTO user_activity (activity_data)
VALUES ('{"user_id": "12345", "activity": "login", "timestamp": "2023-08-12T12:34:56Z"}');

C. Querying JSON Data

With PostgreSQL, you can query the JSON data directly. For instance, to find all activities of a specific user:

SELECT activity_data
FROM user_activity
WHERE activity_data->>'user_id' = '12345';

D. Indexing JSON Data

For performance optimization, you can create indexes on JSONB data:

CREATE INDEX idx_user_activity ON user_activity USING GIN (activity_data);

Custom Data Types:

PostgreSQL allows you to define your custom data types. Let’s say you often work with RGB color values and want a custom data type for it:

CREATE TYPE rgb_color AS (
    r INT,
    g INT,
    b INT
);

Now, you can create a table using this custom data type:

CREATE TABLE colored_items (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    color rgb_color
);

Inserting data into this table would look like:

INSERT INTO colored_items (name, color)
VALUES ('Red Shirt', (255, 0, 0));

The flexibility offered by PostgreSQL’s advanced data types, especially JSONB, allows developers to work with modern data structures directly, reducing the overhead of data manipulation and offering efficient querying capabilities. Custom data types further enhance this flexibility, enabling databases to be tailored to specific application needs.

2. Concurrency and Performance:
PostgreSQL’s implementation of Multi-Version Concurrency Control (MVCC) was another significant draw. This ensures that readers don’t block writers and vice-versa, leading to smoother database operations, especially in high-traffic scenarios. For a project that anticipated a large number of simultaneous users, this was a game-changer.

How MVCC Helps:

MVCC allows multiple transactions to access the same data without interfering with each other. Each transaction sees a “snapshot” of the data, ensuring consistent reads, while writes create a new version of the data.

A. Non-blocking Reads:

With MVCC, if a user is reading data (e.g., checking the inventory of an item), their query won’t be blocked by another user who’s trying to update the same data (e.g., making a purchase).

Example:

User A’s transaction:

BEGIN;
-- User A checks the inventory
SELECT inventory_count FROM products WHERE product_id = 1;
-- This will show the inventory count at the start of User A's transaction, even if User B updates it in the meantime
COMMIT;

User B’s transaction:

BEGIN;
-- User B makes a purchase, reducing the inventory by 1
UPDATE products SET inventory_count = inventory_count - 1 WHERE product_id = 1;
COMMIT;

Even if User B’s update happens while User A’s transaction is ongoing, User A will see a consistent view of the data from the start of their transaction.

B. Non-blocking Writes:

If two users try to purchase the last item simultaneously, MVCC ensures that only one transaction succeeds in updating the inventory, while the other gets a serialization error, preventing the inventory from going negative.

Example:

User A’s transaction:

BEGIN;
-- User A tries to purchase the last item
UPDATE products SET inventory_count = inventory_count - 1 WHERE product_id = 1 AND inventory_count > 0;
-- If successful, this will reduce the inventory to 0
COMMIT;

User B’s transaction (happening simultaneously):

BEGIN;
-- User B also tries to purchase the last item
UPDATE products SET inventory_count = inventory_count - 1 WHERE product_id = 1 AND inventory_count > 0;
-- This will fail to update any rows since the condition `inventory_count > 0` is no longer met
COMMIT;

Only one of these transactions will succeed in updating the inventory, ensuring data integrity.

MVCC in PostgreSQL ensures that multiple users can read and write to the database concurrently without blocking each other, leading to smoother operations even under high traffic. This is crucial for applications that need to maintain data integrity and responsiveness, such as e-commerce platforms during flash sales.

3. Extensibility and Customization:
The extensibility of PostgreSQL is truly remarkable. From spatial extensions like PostGIS to full-text search capabilities, the range of additional features I could seamlessly integrate was vast. This meant fewer third-party tools and a more cohesive database environment.

Scenario: Imagine you’re developing a location-based service, like a restaurant finder. You need to store location data for each restaurant and offer users the ability to search for restaurants within a certain radius of their current location. Additionally, you want to provide a full-text search feature for users to find restaurants based on reviews or cuisine types.

Using PostgreSQL Extensions:

A. Spatial Extensions with PostGIS:

PostGIS is an extension for PostgreSQL that adds support for geographic objects, allowing location-based queries.

Installing the PostGIS extension:

CREATE EXTENSION postgis;

Creating a table with spatial data:

CREATE TABLE restaurants (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    location GEOMETRY(Point, 4326),
    description TEXT
);

Inserting data:

INSERT INTO restaurants (name, location, description)
VALUES ('Tasty Bites', ST_SetSRID(ST_MakePoint(-70.01, 40.71), 4326), 'A delightful place with a variety of cuisines.');

Querying for restaurants within a certain radius:

SELECT name
FROM restaurants
WHERE ST_DWithin(
    location,
    ST_SetSRID(ST_MakePoint(-70.00, 40.70), 4326),
    1000  -- Within 1km
);

B. Full-Text Search Capabilities:

PostgreSQL offers built-in full-text search capabilities, eliminating the need for third-party search solutions.

Creating an index for full-text search:

CREATE INDEX idx_restaurants_description ON restaurants USING gin(to_tsvector('english', description));

Searching for restaurants based on a keyword:

SELECT name
FROM restaurants
WHERE to_tsvector('english', description) @@ to_tsquery('english', 'cuisine');

This query will return restaurants with descriptions containing the word “cuisine”.

While there are dedicated spatial databases like Esri’s ArcSDE, integrating them into applications can be more complex. With PostGIS, spatial capabilities are directly integrated into PostgreSQL, offering a unified system for both spatial and non-spatial data, and solutions like Elasticsearch are powerful and offer extensive features for full-text search. However, for many applications, PostgreSQL’s built-in full-text search provides sufficient capabilities without the overhead of managing and synchronizing a separate search engine.

PostgreSQL’s extensibility, as demonstrated with PostGIS and its full-text search capabilities, allows developers to integrate advanced features directly into their database environment. This not only streamlines the development process but also reduces dependencies on external tools, leading to a more cohesive and efficient system. Whether you’re dealing with spatial data, text searches, or any other specialized need, there’s likely a PostgreSQL extension that can help.

4. Commitment to Open Source and Community Engagement:
While both databases are open source, PostgreSQL’s community-driven approach resonated with me. The active community not only ensures regular updates and security patches but also fosters an environment of collaboration and innovation. This gave me confidence in the longevity and future-proofing of my project.

5. Robustness and Data Integrity:
Data integrity is paramount, and PostgreSQL’s strict adherence to ACID (Atomicity, Consistency, Isolation, Durability) principles provided an added layer of assurance. I could trust that my data would remain consistent and reliable, even in challenging scenarios.

Scenario: Imagine you’re running an online banking platform. A common operation is transferring money from one account to another. This involves two main steps: deducting the amount from the sender’s account and adding it to the receiver’s account. It’s crucial that both these steps succeed or fail together. If one step succeeds and the other fails (due to a system crash or any other reason), it could lead to significant data inconsistency, like money disappearing into thin air.

How ACID Principles Help:

A. Atomicity: Ensures that all operations within a transaction are completed successfully; if not, the transaction is aborted.

Example:

BEGIN;

-- Deduct $100 from sender's account
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;

-- Add $100 to receiver's account
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;

COMMIT;

If there’s an issue in adding the amount to the receiver’s account, the entire transaction will be rolled back, ensuring the sender’s account isn’t wrongly deducted.

B. Consistency: Ensures that a transaction brings the database from one valid state to another.

In our example, the total amount of money in the system remains consistent before and after the transaction.

C. Isolation: Ensures that concurrent execution of transactions leaves the database in the same state as if the transactions were executed sequentially.

If two users simultaneously try to transfer money from the same account, isolation ensures that the operations don’t interfere with each other, preventing potential overdraws.

D. Durability: Once a transaction is committed, it remains so, even in the event of crashes, power failures, etc.

PostgreSQL ensures that once the money transfer is committed, the data is safely stored and won’t be lost.

PostgreSQL’s strict adherence to ACID principles ensures that complex operations, like transferring money between accounts, are handled reliably and consistently. This robustness and commitment to data integrity mean that developers can trust the database to handle critical operations, even in scenarios with high concurrency or system failures. This trust is paramount in applications like online banking, where data inconsistencies can have significant real-world consequences.

6. Advanced Query Capabilities:
PostgreSQL’s support for Window Functions and Common Table Expressions (CTEs) meant I could write more complex queries with greater ease and efficiency. This was particularly beneficial for the analytical aspects of my project.

Absolutely! Let’s explore PostgreSQL’s advanced query capabilities, focusing on Window Functions and Common Table Expressions (CTEs).

Scenario: Imagine you’re managing a sales platform, and you want to analyze the monthly sales performance of each sales representative in comparison to the overall monthly average.

Using Common Table Expressions (CTEs):

CTEs allow you to create temporary result sets that can be easily referenced within your main query.

Example:

Suppose you want to find out the total sales of each representative in a particular month:

WITH MonthlySales AS (
    SELECT sales_rep, SUM(sales_amount) as total_sales
    FROM sales
    WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
    GROUP BY sales_rep
)
SELECT * FROM MonthlySales;

Using Window Functions:

Window functions perform calculations across a set of rows related to the current row, providing more context to the data.

Example:

Building on the previous CTE, let’s find out how each representative’s sales compare to the monthly average:

WITH MonthlySales AS (
    SELECT sales_rep, SUM(sales_amount) as total_sales
    FROM sales
    WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
    GROUP BY sales_rep
)
SELECT 
    sales_rep, 
    total_sales,
    AVG(total_sales) OVER() as monthly_average,
    CASE 
        WHEN total_sales > AVG(total_sales) OVER() THEN 'Above Average'
        ELSE 'Below Average'
    END as performance
FROM MonthlySales;

In this query, the AVG(total_sales) OVER() calculates the average sales for the month, and the CASE the statement then classifies each representative’s performance as “Above Average” or “Below Average” based on this.

PostgreSQL’s support for advanced query capabilities, like CTEs and Window Functions, allows developers to write more expressive and efficient queries. These features are especially beneficial for analytical tasks, enabling deeper insights into data without resorting to complex subqueries or external processing. In scenarios like sales analysis, they provide a clearer picture of individual performance in the context of broader metrics, aiding in more informed decision-making.

Reflections and Moving Forward:

Looking back, while MySQL offered familiarity and a proven track record, PostgreSQL presented a suite of features that aligned more closely with my project’s vision and goals. It was like finding a tool that, while not necessarily better in every aspect, was just the right fit for the job at hand.

Of course, every project is unique, and what worked for me might not work for everyone. But that’s the essence of technology – it’s not about finding the best tool, but the right one. As I continue to explore the depths of PostgreSQL and harness its capabilities, I’m constantly reminded of the importance of aligning technology choices with project needs.

To those of you at the crossroads of this decision, I encourage you to look beyond the surface, understand the nuances, and choose what aligns best with your vision. And as you embark on your journey, may you find as much joy and discovery in the process as I did.

Asif Kibria

Data Science Enthusiast, Software Engineer.

Why I Chose PostgreSQL Over MySQL – A Personal Journey

Related

Share this:

Related