Thursday, January 19, 2012

Fault Policy SOA Suite 11g - Configuration and Deployments

Hi All,


Managing the happy path in a service flow is the first priority in terms of service delivery methodology. But the real pain comes in when the faulty situation needs to be dealt with. Fault Management Framework 11g centrally allows us to manage the faults for a composite application. Service Deliveries which includes multiple services, can have a centralized fault management framework. Basically this allows the service delivery to be more configuration driven, less painful in terms of managing the fault handling, declarative in nature - "if this happens do this!"

In this post, I would concentrate on few aspects of central management of the fault policy for SOA Suite 11g composite apps. There are two aspects of Fault Policy in 11g composites.


1. Handling BPEL faults
2. Handling Mediator Faults

Fault Policy and Fault Binding are the two files which I will discuss in detail later. These files constructs the fault mnanagement framework for SOA Suite 11g. The fault policy and the binding should comply to their respective schemas else the fault management framework [FMF] will not be able to parse the fault policy and binding.


I will divide this post into few aspects as mentioned:

1. Managing Fault Policy Framework

2. Applying Fault Policy - Few use cases

3. Deploying fault policy - Test Case

4. Sample Fault Policy


Fault Policy Framework for 11g SOA Suite:

A detailed description of handling BPEL fault is located at oracle documentation.

Few highlights of Fault Handlers of SOA Suite 11g (11.1.1.1.5 )
  • BPEL fault works only on invocation failures, in case there is a custom fault thrown. If there are two BPELs BPEL-A and BPEL-B, where BPEL-A calls BPEL-B. In case a business fault is thrown in BPEL-A, then fault handlers will not be active on this fault. But in case BPEL-B throws a fault, which is captured at the invocation level of BPEL-A, then fault framework becomes active. Bit sketchy, but yea that's how it works!
  • By default, fault framework allows us to take the following handling actions on occurance of a fault -

      • Human Intervention [humanIntervention] - Reports the fault to the error recovery queue. Support team can log into the Enterprise Manager Console and grep the recovery instance and resubmit the instance.
      • Rethrow [rethrowFault] : Bubbles the fault up to the caller service. Allows the client to handle the fault
      • Termination [abort] : Terminates the process and is dehydrated. No further recovery action can be done on that particular instance.
      • Replay Fault [replayScope] : The replay scope allows us to replay the service.
      • Custom Java Action [javaAction] : Custom java class can be invoked in case we want to handle the fault in an "out of the box" method
      • Retry [retry] : Retry action allows the service to retry on failure. This action has further child elements under it

Serial NoElement NameDescriptionMandatory
1retryCountNumber of retries permissible YES
2retryIntervalInterval between successive retries *YES
3exponentialBackoffImplements exp-backoff algorithm**NO
4retryFailureActionWhat action needs to be taken in case of failure [Advisable to use this element]NO

* If you set the Retry Interval in the fault policy to a duration less than 30 seconds, then the retry may not happen within the specified intervals. This is because the default value of the org.quartz.scheduler.idleWaitTime property is 30 seconds, and the scheduler waits for 30 seconds before retrying for available triggers, when the scheduler is otherwise idle. If the Retry Interval is set to a value less than 30 seconds, then latency is expected.

If you want the system to use a retry interval that is less than 30 seconds, then add the following property under the section <property name="quartzProperties"> in the fabric-config-core.xml file:

org.quartz.scheduler.idleWaitTime=<value>
** Exponential backoff indicates that the next retry attempt is scheduled at 2 x the delay, where delay is the current retry interval. For example, if the current retry interval is 2 seconds, the next retry attempt is scheduled at 4, the next at 8, and the next at 16 seconds until the retryCount value is reached.

Handling Mediator Faults
is pretty much same as what we have for BPEL. The only thing to note about Mediator fault handling is : Mediator fault handling works only if the flow service is a parallel invocation. Else mediator faults are never picked up by the fault handler. In case of a sequential call, it is upto the client to handle the fault. Quite dodgy!
Mediator faults are always thrown in the namespace and part of : {http://schemas.oracle.com/mediator/faults}mediatorFault
There are predefined Mediator Error codes. I will list them out here. But how to implement them in the fault policy, will be available in my next blog.

Mediator Pre-defined Error Codes


The following list describes various error groups contained in the TYPE_ALL error group:
  • TYPE_DATA: Contains errors related to data handling.
    • TYPE_DATA_ASSIGN: Contains errors related to data assignment.

    • TYPE_DATA_FILTERING: Contains errors related to data filtering.

    • TYPE_DATA_TRANSFORMATION: Contains errors that occur during transformation.

    • TYPE_DATA_VALIDATION: Contains errors that occur during payload validation.


  • TYPE_METADATA: Contains errors related to Mediator metadata.
    • TYPE_METADATA_FILTERING: Contains errors that occur while processing the filtering conditions.

    • TYPE_METADATA_TRANSFORMATION: Contains errors that occur during getting the metadata for transformation.

    • TYPE_METADATA_VALIDATION: Contains errors that occur during validation of metadata for Mediator (.mplan file).

    • TYPE_METADATA_COMMON: Contains other errors that occur during the handling of metadata.


  • TYPE_FATAL: Contains fatal errors that are not easily recoverable.
    • TYPE_FATAL_DB: Contains database related fatal errors, such as Datasource not found error.

    • TYPE_FATAL_CACHE: Contains Mediator cache-related fatal errors.

    • TYPE_FATAL_ERRORHANDLING: Contains fatal errors that occur during error handling such as Resubmission queues not available.

    • TYPE_FATAL_MESH: Contains fatal errors from the Service Infrastructure such as Invoke service not available.

    • TYPE_FATAL_MESSAGING: Contains fatal messaging errors arising from the Service Infrastructure.

    • TYPE_FATAL_TRANSACTION: Contains fatal errors related to transactions such as Commit can't be called on a transaction which is marked for rollback.

    • TYPE_FATAL_TRANSFORMATION: Contains fatal transformation errors such as error occurring because of the XPath functions used in a transformation.

  • TYPE_TRANSIENT: Contains transient errors that can be recovered on retrying.
    • TYPE_TRANSIENT_MESH: Contains errors related to the Service Infrastructure.

    • TYPE_TRANSIENT_MESSAGING: Contains errors related to JMS such as enqueue, dequeue.

    • TYPE_INTERNAL: Contains internal errors.

Please refer my next blog for implementation of the fault policy as a service.

6 comments:

  1. could u plesae provide a video sample on fault policy mechonisam in soa suite 11g

    venkatasai0101@gmail.com

    i have a confusion in this concept


    ReplyDelete
  2. Hi Could you share me sample code for BPEL faults..

    ReplyDelete
    Replies
    1. Nagarur,

      Oracle has a good blog and sample code -
      https://blogs.oracle.com/ateamsoab2b/entry/fault_management_framework_by_example

      Cheers
      SM

      Delete
  3. Thanks for this post.
    I have a question here. How can I catch this business faults not associated with any invoke using fault policy file.
    I have used switch case similiar to this example and thrown a business fault. Instance is getting faulted but fault policy is not able catch the fault.

    Regards,
    Debarshi

    ReplyDelete
    Replies
    1. Debarshi,

      Fault policy intercepts on invocation failure. Basically whenever there is an integration issue.

      Why would you need a fault policy when its a business fault? If at all its needed, I suppose you would have a mediation for your process. That component should be trapping the fault. '

      Hope it helps.

      Cheers
      SM

      Delete