java - Spring Data Cassandra driver gets stuck after few hours, with single-node database on the same node -
i've been having problems apache cassandra database access via spring-data-cassandra:
- sometimes server cannot connect database @ start - typically works in 2nd attempt
- once started, couple of times hour, in random moments, few requests fail timeout , continues working fine
- finally, after few hours, driver starts consistently refusing requests, reporting timeouts - , server needs restarted
the application small spring boot (1.4.0) server application using spring data cassandra (tried 1.4.2 , 1.4.4). application collects data remote clients , implements administrative gui based on rest interface on server side, including dashboard prepared every 10 seconds using spring @scheduled tasks , delivering data clients (browsers) via websocket protocol. traffic secured using https , bilateral authentication (server + client certificates).
the current state of application being tested in setup database (2.2.8), running on same cloud server (connecting via loopback 127.0.0.1 address) having ubuntu 14.04 os. couple of test clients create load resulting in around 300k database records per hour (50k master , 5x50k detail records) being inserted, uploading data every 5 seconds or so. dashboard trawling through last hour of traffic , creating statistics. average cpu use sar utility around 10%. current database size around 25gb.
data inserts made in small batches - i've tried individual writes problem hasn't disappeared, cpu usage got increased around 50% while testing single writes.
i've done lot of google "research" topic , found nothing specific, tried quite few of advices e.g. putting schema name in queries , couple of configuration options - apparently no effect final outcome (blocked server needing restart). server has run 30 hours or so, gets blocked within 1-2 hours, running 7-10 hours before driver getting stuck, no obvious pattern in running period.
i've been monitoring heap - nothing particular see, no data structures piling time. server run -xms2g -xmx3g -xx:+printgcdetails
the error appearing is:
caused by: com.datastax.driver.core.exceptions.nohostavailableexception: host(s) tried query failed (tried: inpresec-cassandra/127.0.1.1:9042 (com.datastax.driver.core.operationtimedoutexception: [inpresec-cassandra/127.0.1.1:9042] operation timed out)) @ com.datastax.driver.core.requesthandler.reportnomorehosts(requesthandler.java:217) ~[cassandra-driver-core-2.1.9.jar!/:na] @ com.datastax.driver.core.requesthandler.access$1000(requesthandler.java:44) ~[cassandra-driver-core-2.1.9.jar!/:na] @ com.datastax.driver.core.requesthandler$speculativeexecution.sendrequest(requesthandler.java:276) ~[cassandra-driver-core-2.1.9.jar!/:na] @ com.datastax.driver.core.requesthandler$speculativeexecution$1.run(requesthandler.java:374) ~[cassandra-driver-core-2.1.9.jar!/:na] ... 3 common frames omitted
what have noticed cassandra process reports virtual memory size matching approximately size of database - noticed when database around 12gb , has been following database size faithfully - not sure if has server problem. resident part of database between 2 , 3gb. resident part of server typically 1.5-2.5gb. total memory of cloud server 8gb.
before running cassandra directly in host vm os, running in docker , had same problem - moving out of docker done exclude docker "list of suspects".
if had similar i'd appreciate information or advice.
the problem has apparently been solved upgrading netty , providing support epoll protocol used instead of default fallback nio. in pom.xml there was:
<dependency> <groupid>io.netty</groupid> <artifactid>netty-all</artifactid> <version>4.0.9.final</version> </dependency>
now has been changed to:
<dependency> <groupid>io.netty</groupid> <artifactid>netty-all</artifactid> <version>4.0.29.final</version> </dependency> <dependency> <groupid>io.netty</groupid> <artifactid>netty-transport-native-epoll</artifactid> <version>4.0.29.final</version> <!-- explicitly bring in linux classifier, test may fail on 32-bit linux --> <classifier>linux-x86_64</classifier> <scope>test</scope> </dependency>
adding second specification explicit inclusion of epoll support, sugested here.
after change, original message appearing in log file:
com.datastax.driver.core.nettyutil : did not find netty's native epoll transport in classpath, defaulting nio.
has changed into:
com.datastax.driver.core.nettyutil : found netty's native epoll transport in classpath, using
since there have been no random failures - tried "killing" db connection creating large queries several time - dutifully reported memory error - , recovered.
Comments
Post a Comment