Atomikos Forum

Repeated termination during recovery after checkpoint

Hi,

Using TansactionEssentials 3.7.0. I am experiencing a problem whereby I receive infinite warning messages like the following after restarting my application:


10 Feb 2012 12:06:22,546 WARN  atomikos Thread-2 - Local heuristic termination of coordinator dev1_localhost_8080_TM0002500001 with state COMMITTING
10 Feb 2012 12:08:20,329 WARN  atomikos Atomikos:3 - Local heuristic termination of coordinator dev1_localhost_8080_TM0002500001 with state HEUR_HAZARD
10 Feb 2012 12:08:20,482 WARN  atomikos Atomikos:3 - Local heuristic termination of coordinator dev1_localhost_8080_TM0002500001 with state HEUR_HAZARD
.
.
.

These messages appear repeatedly until I kill the application and start again after deleting the transaction log files. With com.atomikos.icatch.console_log_level=DEBUG, I can see that TM0002500001 was the last active transaction before the log rolled over to a new file (checkpoint). The earlier log shows this:

12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] createCompositeTransaction ( 3600000 ): created new ROOT transaction with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXADataSourceBean 'appdbDataSource': getConnection ( null )...
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXADataSourceBean 'appdbDataSource': init...
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXADataSourceBean 'appdbDataSource': getConnection ( null )...
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXADataSourceBean 'appdbDataSource': init...
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos connection pool 'appdbDataSource': current size: 1/1
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXAPooledConnection: updating last time acquired
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXAPooledConnection: no test query, skipping test
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXAPooledConnection: creating connection proxy...
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: calling toString on vendor connection...
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: toString returning com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXAPooledConnection: returning proxy com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134
12-02-10 11:58:45,007 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos connection pool 'appdbDataSource': got connection from pool, new size: 1/1
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: calling toString on vendor connection...
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: toString returning com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXADataSourceBean 'appdbDataSource': returning com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: calling toString on vendor connection...
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: toString returning com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXADataSourceBean 'appdbDataSource': returning com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002500001 entering state: ACTIVE
12-02-10 11:58:45,008 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] addParticipant ( com.atomikos.jdbc.nonxa.AtomikosNonXAParticipant@46950e5 ) for transaction dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,009 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: calling prepareStatement on vendor connection...
12-02-10 11:58:45,009 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: prepareStatement returning net.sf.log4jdbc.PreparedStatementSpy@4695e50
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: close...
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] ThreadLocalConnection: not reusable yet
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: close done.
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction()  returning instance with id dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] commit() done (by application) of transaction dev1_localhost_8080_TM0002500001
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002500001 entering state: PREPARING
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002500001 entering state: IN_DOUBT
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] prepare() of Coordinator  dev1_localhost_8080_TM0002500001 returning YES vote
12-02-10 11:58:45,056 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002500001 entering state: COMMITTING
12-02-10 11:58:45,059 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: committing on connection...
12-02-10 11:58:45,060 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos non-xa connection proxy for com.microsoft.sqlserver.jdbc.SQLServerDriver@27d8134: resetting autoCommit to true
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] ThreadLocalConnection: detected reusability
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXAPooledConnection: notifying listener: atomikos connection pool 'appdbDataSource'
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] atomikos connection pool 'appdbDataSource': connection AtomikosNonXAPooledConnection became available, notifying potentially waiting threads
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXAPooledConnection: updating last time released
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] AtomikosNonXAPooledConnection: updating last time released
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002500001 entering state: TERMINATED
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002500001 : stopping timer...
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002500001 : disposing statehandler TERMINATED...
12-02-10 11:58:45,061 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002500001 : disposed.
12-02-10 11:58:45,062 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Logfile closed: c:\workspace\myapp\target\tomcat\myapp\..\txlog\tmlog0.log
12-02-10 11:58:45,066 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction() returning NULL!
12-02-10 11:58:45,066 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction() returning NULL!
12-02-10 11:58:45,066 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] getCompositeTransaction() returning NULL!
12-02-10 11:58:45,067 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Coordinator dev1_localhost_8080_TM0002600001 entering state: ACTIVE
12-02-10 11:58:45,067 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] (1.5) executing task: com.atomikos.timing.PooledAlarmTimer@469d3b7
12-02-10 11:58:45,067 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] Creating composite transaction: dev1_localhost_8080_TM0002600001
12-02-10 11:58:45,067 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-2] createCompositeTransaction ( 3600000 ): created new ROOT transaction with id dev1_localhost_8080_TM0002600001
.
.
.

Note the "Logfile closed: c:\workspace\myapp\target\tomcat\myapp\..\txlog\tmlog0.log" line. After this log line, there were several more transactions which all terminated successfully. When I shut down tomcat, there were no pending transactions and the current transaction log file was tmlog1.log. I debugged the restore and found that the first SystemLogImage deserialized from tmlog1.log represented TM0002500001 with an InDoubtStateHandler, state=COMMITTING and forgettable=false. After reading all the lines from the log file, the StreamObjectLog's logTable_ Hashtable contained just this one SystemLogImage (TM0002500001). I continued and stopped at a breakpoint at TransactionServiceImp.java:610. The stack looks like this at this point:

    Daemon Thread [Thread-2] (Suspended (breakpoint at line 610 in TransactionServiceImp))    
    TransactionServiceImp.recover() line: 610    
    TransactionServiceImp.init(Properties) line: 741    
    StandAloneTransactionManager(BaseTransactionManager).init(TransactionServiceImp, Properties) line: 217    
    StandAloneTransactionManager.init(Properties) line: 104    
    UserTransactionServiceImp.init(TSInitInfo) line: 307    
    UserTransactionServiceImp.init(TSInitInfo) line: 413    
    UserTransactionServiceImp.init() line: 605    
    .
    .
    .

When this line calls coord.recover(), the "Local heuristic termination" messages start and are output repeatedly until I kill the application and delete the transaction log. I can repeat this problem by restarting the application soon after the log file has rolled over. Sometimes the log shows the suspect session terminating before the log file is closed (as we see here). But sometimes the "Logfile closed:" message appears after the transaction has terminated but before we see the session terminate. Could you please offer any advice?

Thanks,
Alex Williams Send private email
Friday, February 10, 2012
 
 
Sorry, that last sentence was meant to say "after the transaction has started but before the transaction terminates.".
Alex Williams Send private email
Friday, February 10, 2012
 
 
Hi,

The 'logfile closed' happens when we do checkpoints - i.e., cleanup the current log file and create a new, compacted one. That in itself should not be problematic.

Regarding the repeated warnings: could you be seeing the same as here: default82df.html?community.6.2615.3 ?

Thanks
Guy Pardon Send private email
Friday, February 10, 2012
 
 
Thanks for you reply Guy. This is not the same problem as in the other post (in fact the last post on that thread was posted by my colleague Alkesh on my behalf - forum registration wasn't working).

With the other problem, we only get 2 messages per transaction and then they stop. Those log messages are output when each SystemLogImage is deserialized from the log file. Because there is a forgettable_=true entry for each of the transactions, it doesn't actually try to recover any of them.

This new problem has more of an impact. I appreciate that the closing of the logfile is a normal part of checkpointing but I mentioned it because it seems significant. The transaction which is being logged over and over again is the same transaction and it is the last active transaction before the last checkpoint. It is a transaction which terminated successfully. However, because a single SystemLogImage exists in the new logfile (after the checkpoint) with forgettable_=false, it attempts to recover this transaction over and over and over again. I have to kill my application at this point and delete the transaction logfiles.

I don't know whether it is significant but this transaction had just one participant - an AtomikosNonXAParticipant. Could you please have another look.

Thanks,
Alex Williams Send private email
Friday, February 10, 2012
 
 
Thanks for clarifying!

It looks like you did some extensive analysis efforts on this - I created a bug report to look into it ASAP.
Guy Pardon Send private email
Sunday, February 12, 2012
 
 
I checked and you're right - it is a bug. In particular, the checkpoint will not take into account the last log image flushed to disk.

Thanks again for pointing this out!
Guy Pardon Send private email
Sunday, February 12, 2012
 
 
Thanks for the swift follow up and response Guy. Now you've confirmed it's a bug, I'll see if I can understand where the problem lies in the code. I'll also keep an eye on the defect in fogbugz.

Thanks again,
Alex
Alex Williams Send private email
Monday, February 13, 2012
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics