Oracle Client ORA-03113 error hangs my windows service

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a Windows Service which sniffs an Oracle 9i queue table every X
milliseconds and processes any jobs it finds. I am using the Exception
Handling Application Block to log Oracle Errors to Email, Event Log, and Text
File and then continue processing.

The problem is that when I get certain Oracle Errors, like if my table
doesn't exist, I'm able to recover, handle the exception, and move on to the
next loop. When I get an ORA-03113 "end-of-file encountered on communication
channel" error, though, the error is logged but the service appears to hang.
Is there anything I need to do to recover gracefully from this exception? I
am certain that in all of the ORA-03113 errors, the problem was that the
Oracle Server was being bounced, but if this happens in producton I'd like to
be able to recover from this without having to have someone in support
restart my windows service.

Any ideas out there? Has anyone experienced this before where the ORA-03113
hangs a Windows Service?

Thanks in advance,

Themanfromsql
 
Where is it hanging? If you're logging the error I'm guessing that it's
hanging when you try to open a subsequent connection. Are you able to
replicate this? How are you connecting to the server (provider, sqlnet
params, etc). I ran into a similar problem after Oracle errors when
connecting with the Microsoft managed provider for Oracle and using an
Oracle nameserver for name resolution.
 
Hi,

I did some research, but I don't think it's the Oracle's error which hangs
the windows service. Could you show us how you handle these exception? Are
you doing this in a single thread or multiple threaded?

Kevin Yu
Microsoft Online Community Support

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.
Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================

(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 
Hi Kevin,

I did some research into this, too, and I think it might be related to this
known issue:

http://support.microsoft.com/default.aspx?scid=kb;en-us;830173

Both my server and my laptop have v 1.1 (no SP) with a couple of unrelated
patches. Here's the strange thing, though: the behavior I noticed when the
service was running on the server is that the service throws the error once
and then never recovers completely. On my laptop I get the behavior
described in the above article where I first get a 3113 error, and then every
subsequent loop I get a 3114 error. Why would I get different behavior on
two different machines? Different OSes? Different versions of the .NET
Framework (workstation vs. standard?)? Multiple Processors?

To answer your question, Kevin, the service exception handling works
something like this:

1. Let a timer run for X seconds (currently X = 10)

on timer.elapsed:

2. Turn the timer off
try{
3. Check a DB table for jobs to process (this is where the failure is)
4. If job found, call a web service asynchronously, passing in the job info
and a callback method within the windows service.
}
catch exception{
5. Use the EHAB to log the error to event log, file, and email.
6. Based on the EHAB policy file, do not bubble up the exception.
}
7. Turn the timer back on.

I don't have enough information about where exactly the windows service
hung, but I do know that it was some time after the exception handling.
Because of the timer event and the asynchronous web service call, this
service does use multiple threads, but they are all handled by the .NET
thread pool.

Any thoughts?

Thanks,

TheManFromSql
 
Hi,

The error code is generated by the database. So, I don't think it's your
application's problem. Are the two clients passing different parameters to
same stored procedure that produces different results?

Also, to eliminate the impact of the known issue described by KB article, I
suggest you install .NET framework 1.1 SP1. Please let me know if it still
doesn't work.

Kevin Yu
Microsoft Online Community Support

==================================================

(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 
Hi Kevin,

Unfortunately, I don't have enough control over the environment to be able
to install SP1. There are several hot fixes that have been installed since
1.1, and the migration path currently is toward 2.0.

The statement is not a stored proc, actually the command is a simple "select
* from table".

Yesterday I added several debug.trace statements to the service and tried to
recreate the scenario by asking one of our DBAs to kill a session in progress
for the Windows Service. Unfortunately, the 0028 error that was produced did
not give me the same result (the service recovered fine and picked up again
the next session). I also tried recreating the scenario on my laptop (XP
Pro) by unplugging the ethernet cable. The behavior I found here matches the
behavior indicated in the KB Article (the service kept spitting out Ora-3114
errors every time it went back to the database). Probably has to do with the
default machine.config settings on my laptop.

Any other thoughts?
 
I would suggest that you use Microsoft debugging tools but if you don't
have access to install sp1 you probably don't have access to install
windbg. Have you ensured that the client is connecting using tnsnames
and not oracle nameserver? Also, have you looked around in metalink at
all? I'm sure your DBA has an account there.
 
Hi,

Since the current behavior matches the bahavior described in the KB
articlie, the best way to solve this issue is to apply the hotfix or
Service Pack. Could you have some one who have permission to that machine
install it? I know that you have concern because you're migrating to .NET
framework 2.0, but apply the SP1 on .NET framework 1.1 doesn't conflict
with the installation of 2.0.

If you have any other concerns, please let me know.

Kevin Yu
Microsoft Online Community Support
==================================================

(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 
Hello,

This issue appears to be a different issue than I had previously thought. I
believe there is still something going on with the connection pool, but I
don't think it's being caused by the OracleClient namespace. I've created a
separate thread for the issue I am encountering.

Kevin, I consider this issue to be resolved.

Thanks for your help,

TheManFromSql
 
Hi Patrick,

You're welcome. Thanks for sharing your experience with all the people
here. If you have any questions, please feel free to post them in the
community.

Kevin Yu
Microsoft Online Community Support
==================================================

(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 
Back
Top