Recently, i have an opportunity to build a cluster in university lab, setup consist of 11 Lenovo P400 thinkstaion, one is head node and 10 are compute nodes, which is based on Rocks Cluster CentOS 7. One GiGa Ethernet switch is used for inter-connectivity.
Problem is that Fluent is working fine but as fluent try to save a case and data file simulation disrupted, this disruption comes during saving a file or just after saving or after few iterations. One of the node disconnected from head node, show as dead node in Ganglia but still running (standalone) and failed the MPI.
(for someone who think case file may not be good, this is not the problem, case file running fine on other operating stations fine.)
error's be like
MPI Application rank 16 exited before MPI finalized() with status 0
Node 71: Process 3384: Received Signal SIGTERM
Killed by Signal 15
the fluent process cold not be started