When you insert one or couple of rows into Mysql, you just use standard INSERT queries and don't give a shit. But when it comes to insert a lot of data at once (e.g. loading data into table) we can loose a lot of time waiting for actual data being loaded.

First of all, tweak settings so Mysql can insert data in the most efficient way on your hardware.

Why a lot of INSERTS is bad

If we load data as a set of INSERTs, we make the worst possible choise to haraas our Mysql server:

Let's measure how long it takes on a small virtual 1-core node to insert 5000 rows in a row-by-row approach (using Mysqly framework):

<?php
$start = microtime(1);
for ( $i = 0; $i < 5000; $i++ ) {
  mysqly::insert('bulk', ['col' => 'val', 'col2' => 'val2']);
}
echo microtime(1) - $start;

And will will take:

3.8

So we'll load data at the speed of around 1300 rows per second on our server. Loading millions of rows can take hours. Not cool.

Bulk insert

The simplest approach is to join multiple INSERTs into single query by using bulk format:

INSERT INTO bulk(col1, col2) VALUES('a1', 'b1'), ('a2', 'b2'), ('a3', 'b3'), ...

Let's use this approach to load data (using Mysqly multi_insert() method which does exactly what we need):

<?php
$start = microtime(1);
$insert = [];
for ( $i = 0; $i < 10000; $i++ ) {
  $insert[] = ['val' => md5(mt_rand(1, time()))];
}
mysqly::multi_insert('bulk', $insert);
echo microtime(1) - $start;

This will give us much better performance:

0.25

Which, as we see, is almost 30 times faster (we've inserted twice as much rows) and around 40k insers per second. So this method should be absolutely used when we insert a lot of rows at once. The size of bulk INSERT query is limited by max_allowed_packet, so either increase it or split big query into multiple smaller to fit into the limit.

Load data from CSV

Now the method, not so popular, but let's check if that's fair. Assume we have our data prepared and stored in /tmp/data.csv. Loading data from CSV is simple:

LOAD DATA INFILE '/tmp/data.csv'
INTO TABLE bulk
FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n'

This method allowed us to load 905k rows into table within 5 seconds. This give us speed of 180k rows per second which is fantastic performance:

So in order to load large amounts of data into Mysql table, use either LOAD DATA INFILE or bulk INSERT...VALUES(),(),(),... statement.

written on Aug, 2022