Ansys solving system coupling problems on a remote cluster does not provide constant results.

  • 167 Views
  • Last Post 08 November 2018
NOS_ATX posted this 05 September 2018

Hello All,

Our group has established a RSM connection to solve system coupling (Transient Structural & Fluent) problems. However we discovered that the RSM/Cluster does not provide constant results. We have tested a small project many times. Results of the same project ended up differently. Only a few run succeed and finished as we expected and the running time was around 8-12 hours. However other tries ended up either ran into connecting/license problem or were continuously running until they reached kill off time (120 hours), which we could not understand. Such issue was not found if we ran these two modulus individually. Can somebody help us figure out why our RSM solver behaved like this?

 

Thanks,

 

Boyuan

Order By: Standard | Newest | Votes
rwoolhou posted this 07 September 2018

To check, the solver results (contours etc) are the same for runs that complete, and the inconsistent behaviour is that the jobs don't always complete? 

If it's the latter, check what else is running on the cluster. If the licence is lost during the run it can interrupt the solver or prevent additional activities from being triggered. Eg Fluent checks the licence during the run at intervals and when writing: if someone else grabs the Fluent licence during the Mechanical run you have a problem. 

Are there any errors visible in the Workbench window? 

NOS_ATX posted this 10 September 2018

Hi rwoolhou,

Yes, the solver results are the same, as we also compared with results finished locally.

The license should be fine since we had that problem once before and it had clear warning/error. The only thing strange to me is that during running, the log showed that the machine consistently transfered file scLog.scl_ and also noticed that the file already exists. I wonder if the running got stuck in a loop.

Thanks,

Boyuan 

rwoolhou posted this 11 September 2018

Thanks Boyuan. My suspicion would be on the IT side and whether some permissions are getting mixed up or timing out during the runs. Not something we can fix on an open forum, so you may need to ask your ANSYS contact to talk to the US support team. 

Kremella posted this 18 September 2018

Hello Boyuan,

Thank you for your patience. We are looking internally on how best to help you with your question.

Thank you.

Best Regards,

Karthik

jcallery posted this 19 September 2018

Hi Boyuan,

Could you please attach the RSM Job report from the failed job?

You can get that by opening the RSM Job Monitor, selecting the job, right click in the bottom pane, and select Save Job Report.

If you could send the log from a failed one, and one from a successful run, that would be helpful.

I have a feeling that the solver process may be crashing, and thus never finishing.  That will be tough to tell from just the RSM log, but its a place to start anyway.

 

Thank you,

Jake

NOS_ATX posted this 21 September 2018

Hi Jake,

I have stored some failed log. However I did not save the successful one. Can I recall them? Also how should I attach the log file on this open forum? Some of them are quite big (400Mb).

Best,

Boyuan

jcallery posted this 24 September 2018

Hi Boyuan,

Are you able to zip up that log file?  It should make it much smaller.

RSM Job Logs are cleaned up eventually, so you may not be able to retrieve them.

Could you create a much smaller project that uses the same types of systems and analysis, and run that as well? (one that solves quicker)

I would like to try and compare a successful log to the failed one.

Thank you,

Jake

NOS_ATX posted this 25 September 2018

Hi Jake,

I have tried but nothing successful since. I will attach a failed log for you.

Thanks!

Boyuan

Attached Files

NOS_ATX posted this 02 October 2018

Hi Jake,

Any thoughts on the attached failed log?

Thanks!

Boyuan

jcallery posted this 02 October 2018

Hi Boyuan,

I apologize, it doesn't seem I was notified by the forum that you had uploaded a file.

Let me look into it today.

Thank you,

Jake

jcallery posted this 02 October 2018

Hi Boyuan,

Can you see if the cluster admin will install the RSM Launcher service on a cluster submit node?

I think that will help to stabilize the issues that you are seeing.  I would really like to see if that helps.

Thank you,

Jake

NOS_ATX posted this 07 October 2018

Hi Jake,

Here is the responding from the cluster:

 

"The cluster doesn't support running 'services' as a general rule; if this is something that can be installed and run by users on an as-needed basis within a submission, it might be an option.

 

If it's something that needs to run 24x7 with dedicated resources waiting for connections from outside the cluster, it's not something we can offer, but if it can run elsewhere and connect to the cluster over ssh to start jobs, then it might be possible.

 

Is there any official documentation about it?  I see other sites referencing it, but very little seems to be from ANSYS officially."

 

 

Best,

Boyuan

jcallery posted this 08 October 2018

Hi Boyuan, 

Yes, the official docs for RSM are here:

https://ansyshelp.ansys.com/account/secured?returnurl=/Views/Secured/corp/v192/wb_rsm/wb_rsm.html

 

You might be able to run the RSMLauncher service as your own user, but they will need to open a port on the cluster's firewall.

Is that an option?

 

Thank you,

Jake

NOS_ATX posted this 06 November 2018

Hi Jake,

Sorry for the long delay. I was out of the country for a while. The Cluster Admin responded me that,

"Based on your description, it may be possible; presumably the firewall change would be to allow incoming connections to the cluster.  That by itself won't work, we can't route internet traffic directly to processing nodes.

 

However, if you start a job that listens to a port on the processing node, you can use ssh's port forwarding so that you can log in to ghpcc06 and forward a port on your local pc through the ssh tunnel to the node running your job."

 

Again, we do not have connecting problem running CFD or FEA individually on the cluster. The problem showed up when we running system coupling cases.

 

Thanks!

 

Boyuan

rwoolhou posted this 08 November 2018

System coupling will automatically open & close tools as needed: I'm not sure if the system will treat these differently to manually opened files. 

Would it be possible to try a small parametric run (just set a pipe with variable diameter) and output the mass flow and see if that will work. You'd only need a few thousand cells and it'll complete in under 30 minutes. IT can then monitor the system to see what is being called. 

Close