Friday, March 7, 2014

Mysql 5.6.16 on OSX throws connection errors with innodb_file_per_table option

Description:
Under heavy workload mysql 5.6.16 (replicated on 5.6.13 and higher)
throws connection errors "Lost connection to MySQL server at 
'sending authentication information', system error: 32". 
This was tested on OSX 10.8 and 10.9. The issue is not seen 
on windows or other unix servers like centos.

How to repeat:
This was tested and the issue was repeatable with a simple database 
import through the mysql client.The database dump had around 1000 tables
 and 200 MB in size. While the dump is being sourced other mysql client(s)
 trying to connect (even if it were a different user than the one running
 the import) you get the authentication error.

"mysql -uroot -p
Enter password: 
ERROR 2013 (HY000): Lost connection to MySQL server at 
'sending authentication information', system error: 32"

Also observed that passing in the host causes mysql to 
throw a different exception:

"mysql -uroot -p -h127.0.0.1
Enter password: 
ERROR 2013 (HY000): Lost connection to MySQL server at 
'reading authorization packet', system error: 0"

Running mysql under debug (mysql-dubug) and generating a trace 
shows the below when the issue occurs:

[login_connection: info: login_connection called by thread 11
login_connection: info: New connection received on TCP/IP (1089)
login_connection: info: Host: unknown host  ip: 127.0.0.1
acl_authenticate: info: com_change_user_pkt_len=0
my_message_sql: error: error: 1158  message: 
'Got an error reading communication packets'
my_message_sql: error: error: 1043  message: 'Bad handshake'
one_thread_per_connection_end: info: thd 0x7f824d245200 block_pthread 1 
 
The interesting piece of trace being "login_connection: info: 
Host: unknown host  ip: 127.0.0.1" where it sees the host as 
"unknown host" which appears to be the cause of the authentication error.

This could also be seen under the performance schema where the HOST
 is seen as NULL while multiple clients try to connect:

mysql> select * from hosts;
+-----------+---------------------+-------------------+
| HOST      | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS |
+-----------+---------------------+-------------------+
| NULL      |                  18 |                35 |
| localhost |                   1 |                 2 |
+-----------+---------------------+-------------------+
2 rows in set (0.00 sec)

The counter "Aborted_connects" keeps increasing.

From the JDBC side, tested this with the recent connector/j and mariadb
java client and both fail with similar exceptions:

[Caused by: java.io.EOFException: 
unexpected end of stream, read 0bytes from 4
 at org.mariadb.jdbc.internal.common.packet.buffer.ReadUtil.readFully
(ReadUtil.java:84) [mariadb-java-client.jar:]
 at org.mariadb.jdbc.internal.common.packet.buffer.ReadUtil.readFully
(ReadUtil.java:92) [mariadb-java-client.jar:]
 at org.mariadb.jdbc.internal.common.packet.RawPacket.nextPacket
(RawPacket.java:77) [mariadb-java-client.jar:]
 at org.mariadb.jdbc.internal.common.packet.SyncPacketFetcher.getRawPacket
(SyncPacketFetcher.java:67) [mariadb-java-client.jar:]
 at org.mariadb.jdbc.internal.mysql.MySQLProtocol.connect
(MySQLProtocol.java:468) [mariadb-java-client.jar:]
 ... 69 more"]

Also observed that bouncing the mysql instance clears up the problem. 
It was also seen that terminating all the mysql clients like jdbc 
(shutting down the app servers) also helps with the issue and you are
 able to login once again.

my.cnf that was used to test this:
[
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.6/en/server-configuration-defaults.html

[mysqld]
interactive_timeout=180
wait_timeout=180
max_connect_errors=9999
skip-name-resolve
query_cache_size=0
bind-address=127.0.0.1

# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
innodb_buffer_pool_size = 2G

# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin

# These are commonly set, remove the # and set as required.
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
server_id = kris_dev
socket = /tmp/mysql.sock 

# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
join_buffer_size = 128M
sort_buffer_size = 2M
read_rnd_buffer_size = 2M 

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 

[client]
default-character-set=utf8

]

 
Workaround
 
This was tested on machine specs ranging from 4GB, 8GB and 16 GB and 32 GB RAM.
Things were a little better on the higher specs (16GB and above) but looked severe
on anything 8GB and below.

Further debugging and tracing has led to the mysql option "innodb_file_per_table" 
which is turned on by default on Mysql 5.6 to be causing this issue. Turning off 
this option has revealed positive results so far and no occurrence of 
system error: 32. 
 
It was also noticed that increasing open file limits (ulimit -n) 
on the higher spec machines helped with better results while running with
"innodb_file_per_table" turned on. 


Tools used: mysql-debug (trace option), dtruss

Logged this under: http://bugs.mysql.com/bug.php?id=71960

Hope this helps!