speeding up data insertion from pandas dataframe to mysql

  • Last Update :
  • Techknowledgy :

If you know the data format (I assume all floats), you can use numpy.savetxt() to drastically reduce time needed to create CSV:

% timeit df.to_csv(csv_fname)
2.2221.3 ms per loop(mean± std.dev.of 7 runs, 1 loop each)

from numpy
import savetxt
   %
   timeit savetxt(csv_fname, df.values, fmt = '%f', header = ','.join(df.columns), delimiter = ',')
714 ms± 37.7 ms per loop(mean± std.dev.of 7 runs, 1 loop each)

Please note that you may need to prepend

df = df.reset_index()

Suggestion : 2

I need to insert a 60000x24 dataframe into a mysql database (MariaDB) using sqlalchemy and python. The database runs locally and the data insertion runs locally as well. For now I have been using the LOAD DATA INFILE sql query, but this requires the dataframe to be dumped into a CSV file, which takes about 1.5-2 seconds. The problem is I have to insert 40 or more of these dataframes, so the time is critical., 1 week ago Dec 12, 2019  · Photo by Mika Baumeister on Unsplash. This article gives details about: different ways of writing data frames to database using pandas and pyodbc; How to speed up the inserts to sql database using ... , The best-opted way will be directly importing the table to the data frame. That will be easier for analysis data against all perspectives. To connect MySQL using pandas, need to install package ‘mysql-connector-python’ as below command. ,If I use df.to_sql then the problem gets much worse. The data insertion takes at least 7 (up to 30) seconds per dataframe.


sql_query = "CREATE TABLE IF NOT EXISTS table(A FLOAT, B FLOAT, C FLOAT)"
# 24 columns of type float cursor.execute(sql_query) data.to_sql("table", con = connection, if_exists = "replace", chunksize = 1000)
sql_query = "CREATE TABLE IF NOT EXISTS table(A FLOAT, B FLOAT, C FLOAT)"
# 24 columns of type float cursor.execute(sql_query) data.to_sql("table", con = connection, if_exists = "replace", chunksize = 1000)
sql_query = "CREATE TABLE IF NOT EXISTS table(A FLOAT, B FLOAT, C FLOAT)"
# 24 columns of type float cursor.execute(sql_query) data.to_csv("/tmp/data.csv") sql_query = "LOAD DATA LOW_PRIORITY INFILE '/tmp/data.csv' REPLACE INTO TABLE 'table' FIELDS TERMINATED BY ',';"
cursor.execute(sql_query)
% timeit df.to_csv(csv_fname) 2.2221.3 ms per loop(mean± std.dev.of 7 runs, 1 loop each) from numpy
import savetxt % timeit savetxt(csv_fname, df.values, fmt = '%f', header = ','.join(df.columns), delimiter = ',') 714 ms± 37.7 ms per loop(mean± std.dev.of 7 runs, 1 loop each)
df = df.reset_index()

Suggestion : 3

This article describes different techniques for inserting data quickly into MariaDB.,The following describes the different techniques (again, in order of importance) you can use to quickly insert data into a table.,In many storage engines (at least MyISAM and Aria), ENABLE KEYS works by scanning through the row data and collecting keys, sorting them, and then creating the index blocks. This is an order of magnitude faster than creating the index one row at a time and it also uses less key buffer memory.,When inserting new data into MariaDB, the things that take time are: (in order of importance):

You can temporarily disable updating of non unique indexes. This is mostly useful when there are zero (or very few) rows in the table into which you are inserting data.

ALTER TABLE table_name DISABLE KEYS;
BEGIN;
...inserting data with INSERT or LOAD DATA....
COMMIT;
ALTER TABLE table_name ENABLE KEYS;

When inserting big amounts of data, integrity checks are sensibly time-consuming. It is possible to disable the UNIQUE indexes and the foreign keys checks using the unique_checks and the foreign_key_checks system variables:

SET @ @session.unique_checks = 0;
SET @ @session.foreign_key_checks = 0;

For InnoDB tables, the AUTO_INCREMENT lock mode can be temporarily set to 2, which is the fastest setting:

SET @ @global.innodb_autoinc_lock_mode = 2;

You can also read a file locally on the machine where the client is running by using:

LOAD DATA LOCAL INFILE 'file_name'
INTO TABLE table_name;

You can import many files in parallel with mariadb-import (mysqlimport before MariaDB 10.5). For example:

mysqlimport--use - threads = 10 database text - file - name[text - file - name...]