Fine Tune Your Polling and Batching in Mule ESB

They say it's best to learn from others. With that in mind, let's dive into a use case I recently ran into. We were dealing with a number of legacy systems when our company decided to shift to a cloud-based solution. Of course, we had to prepare for the move — and all the complications that came with it.Use CaseWe have a legacy system built with Oracle DB using Oracle forms to create applications and lots and lots of stored procedures in the database. It's also been in use for over 17 years now with no major upgrades or changes. Of course, there have been a lot of development changes over these 17 years that taken the system close to the breaking point and almost impossible to implement something new. So, the company decided to move to CRM (Salesforce) and we needed to transfer data to SF from our legacy database. However, we couldn't create or make any triggers on our database to send real-time data to SF during the transition period.SolutionSo we decided to use Mule Poll to poll our database and get the records in bulk, then send them to SF using the Salesforce Mule connector.I am assuming that we all are clear about polling in general. If not, please refer to references at the end. Also, if you are not clear with Mule polling implementation there are few references at the bottom, too. Sounds simple enough doesn't it? But wait, there are few things to consider.What is the optimum timing of the poll frequency of your polls?How many threads of each poll you want to have? How many active or inactive threads do you want to keep?.How many polls can we write before we break the object store and queue store used by Mule to maintain your polling?What is the impact on server file system if you use watermark values of the object store?How many records can we fetch in one query from the database?How many records can we actually send in bulk to Salesforce using SFDC?These are few, if not all the considerations you have to do before implementation. The major part of polling is the WATERMARK of polling and how Mule implements the watermark in the server.Polling for Updates Using WatermarksRather than polling a resource for all its data with every call, you may want to acquire only the data that has been newly created or updated since the last call. To acquire only new or updated data, you need to keep a persistent record of either the item that was last processed, or the time at which your flow last polled the resource. In the context of Mule flows, this persistent record is called a watermark.To achieve the persistency of watermark, Mule ESB will store the watermarks in the object store of the runtime directory of a project in the ESB server. Depending on the type of object store you have implemented, you may have a SimpleMemoryObjectStore or TextFileObjectStore, which can be configured like below: Below is a simple memory object store sample: Below is text file object store sample: For any kind of object store, Mule ESB creates files in-server, and if the frequency of your polls are not carefully configured, then you may run into file storage issues on your server. For example, if you are running your poll every 10 seconds with multiple threads, and your flow takes more than 10 seconds to send data to SF, then a new object store entry is made to persist the watermark value for each flow trigger, and we will end up with too many files in the server object store.To set these values, we have consider how many records we are fetching from the database, as SF has limit of 200 records that you can send in one bulk. So, if you are fetching 2,000 records, then one batch will call SF 10 times to transfer  these 2,000 records. If your flow takes five seconds to process 200 records, including the network transfer to send data to SF and come back, then your complete poll will take around 50 seconds to transfer 2,000 records.If our polling frequency is 10 seconds, it means we are piling up the object store.Another issue that will arise is the queue store. Because the frequency and execution time have big gaps, the queue store's will also keep queuing. Again, you have to deal with too many files.To resolve this, it’s always a good idea to fine-tune your execution time of the flow and frequency to keep the gap small. To manage the threads, you can use Mule's batch flow threading function to control how many threads you want to run and how many you want to keep active.I hope few of the details may help you set up your polling in a better way.There are few more things we have to consider. What happens when error occurs while sending data? What happens when SF gives you error and can't process your data? What about the types of errors SF will send you? How do you rerun your batch with the watermark value if it failed? What about logging and recovery? I will try to cover these issues in a second blog post.Refrences:https://docs.mulesoft.com/mule-user-guide/v/3.6/poll-reference#polling-for-updates-using-watermarkshttps://docs.mulesoft.com/mule-user-guide/v/3.7/poll-referencehttps://docs.mulesoft.com/mule-user-guide/v/3.7/poll-schedulers#fixed-frequency-schedulerhttps://en.wikipedia.org/wiki/Polling_(computer_science)

Read more

SpaceX rocket lifts off on cargo run, then lands at launch site

CAPE CANAVERAL, Fla. An unmanned SpaceX rocket blasted off from Florida early on Monday to send a cargo ship to the International Space Station, then turned around and landed itself back at the launch site.The 23-story-tall Falcon 9 rocket, built and flown by Elon Musk’s Space Exploration Technologies, or SpaceX, lifted off from Cape Canaveral Air Force Station at 12:45 a.m. EDT (0445 GMT).Perched on top of the rocket was a Dragon capsule filled with nearly 5,000 pounds (2,268 kg) of food, supplies and equipment, including a miniature DNA sequencer, the first to fly in space.Also aboard the capsule was a metal docking ring of diameter 7.8 feet (2.4 m), that will be attached to the station, letting commercial spaceships under development by SpaceX and Boeing Co. ferry astronauts to the station, a $100-billion laboratory that flies about 250 miles (400 km) above Earth. The manned craft are scheduled to begin test flights next year.Since NASA retired its fleet of space shuttles five years ago, the United States has depended on Russia to ferry astronauts to and from the station, at a cost of more than $70 million per person.As the Dragon cargo ship began its two-day journey to the station, the main section of the Falcon 9 booster rocket separated and flew itself back to the ground, touching down a few miles south of its seaside launch pad, accompanied by a pair of sonic booms. "Good launch, good landing, Dragon is on its way," said NASA mission commentator George Diller.Owned and operated by Musk, the technology entrepreneur who founded Tesla Motors Inc, SpaceX is developing rockets that can be refurbished and re-used, potentially slashing launch costs. With Monday’s touchdown, SpaceX has successfully landed Falcon rockets on the ground twice and on an ocean platform during three of its last four attempts.SpaceX intends to launch one of its recovered rockets as early as this autumn, said Hans Koenigsmann, the firm's vice president for mission assurance. (Reporting by Irene Klotz, Editing by Chris Michaud and Clarence Fernandez)

Read more

1 in 16 Java Components Have Security Defects

Sonatype just released it's 2nd annual State of the Software Supply Chain Report.  Over the past year, researchers amassed a great deal of data with respect to the staggering volume and variety of Java (as well as NuGet, RubyGems, npm) open source components flowing through software supply chains into development environments.  This year, the report assessed behaviors across 3,000 organizations and performed deep analysis on over 25,000 applications.The results we discovered ranged from staggering to surprising to sobering.  For example, researchers measured organizations consuming an average of 229,000 components annually.  The good news is, these components help companies accelerate their development and innovation.  At the same time, we saw 6.8% of components used in applications marked with at least one known security vulnerability — adding high levels of security debt.  Not all components are created equal.In the past year, Sonatype was far from the only organization pursuing the need for improved software supply chain practices.  The researchers studied the patterns and practices exhibited by high-performance organizations and documented how these innovators are utilizing the principles of software supply chain automation to manage the massive flow and variety of open source components.  These organizations are striving to consistently deliver higher quality applications for less, while lowering their risk profile. This year’s report profiles organizations across banking, insurance, defense, energy, technology, and government sectors.The 2016 State of the Software Supply Chain Report blends public and proprietary data with expert research and analysis to reveal the following:Developers are gorging on an ever expanding supply of open source components.  Billions of open source components were downloaded in the last year.Vast networks of open source component suppliers are growing rapidly.  Over 1,000 new open source projects and 10,000 new versions of open source components are introduced daily.Massive variety and volume of software components vary widely in terms of quality.  1 in 16 parts include a known security defect.Top performing enterprises, federal regulators and industry associations have embraced the principles of software supply chain automation to improve the safety, quality, and security of software.If you are developing with Java or other open source components, we invite you to read the report and leverage the insights to understand how your organization’s practices compare to others. If you would like to join a live discussion on this year's report, you can hear from the research team on Wednesday, July 13th. Save your seat here.

Read more

The Life of a Serverless Microservice on AWS

In this post, I will demonstrate how you can develop, test, deploy, and operate a production-ready serverless microservice using the AWS ecosystem. The combination of AWS Lambda and Amazon API Gateway allows us to operate a REST endpoint without the need of any virtual machines. We will use Amazon DynamoDB as our database, Amazon CloudWatch for metrics and logs, and AWS CodeCommit and AWS CodePipeline as our delivery pipeline. In the end, you will know how to wire together a bunch of AWS services to run a system in production.The LifeMy idea of "The Life of a Serverless Microservice on AWS" is best described by this figure:A developer is pushing code changes to a repository. This git push triggers the CI & CD pipeline to deploy a new version of the service, which our users consume. The load generated on the system produces logs and metrics that are used by the developer to operate the system. The operational feedback is used to improve the quality of the system.What is Serverless?Serverless or Function as a Service (FaaS) describes the idea that the deployment unit is a single function. A function takes input and returns output. The responsibility of the FaaS user is to develop the function while the FaaS provider's responsible is to execute the function whenever some event happens. The following figure demonstrates this idea.Some possible events:File uploaded.E-Mail received.Database changed.Manual invoked.HTTP API called.Cron.The cool things about serverless architecture are:You only pay when the function is executed.No under/over provisioning.No boot time.No patching.No SSH.No load balancing.Read more about Serverless Architectures if you are interested in the details.What is a Microservice?Imagine a small system where users have a public visible profile page with location information of that user. The idea of a microservice architecture is that you slice your system into smaller units around bounded contexts. I identified three of them:Authentication Service: Handles authentication.Location Service: Manages location information via a private HTTP API. Uses the Authentication Service internally to authenticate requests.Profile Service: Stores and retrieves the profile via a public HTTP API. Makes an internal call to the Location Service to retrieve the location information.Each service gets its own database, and services are only to communicate with each other over well-defined APIs, not the database!Let's get started!The source code and installation instruction can be found at the bottom of this page. Please use the us-east-1 region! We will use services that are not available in other AWS regions at the moment.CodeAWS CodeCommit is a hosted Git repository that uses IAM for access control. You need to upload your public SSH key to your IAM User as shown in the following figure:Creating a repository is simple. Just click on the Create new Repository button in the AWS Management Console.We need a repository for each service. You can then clone the repository locally with the following command. Replace $SSHKeyID with the SSH Key ID of your IAM user and $RepositoryName with the name of your repository.git clone ssh://$SSHKeyID@git-codecommit.us-east-1.amazonaws.com/v1/repos/$RepositoryName` We now have a home for our code.Continuous Integration & Continuous DeliveryAWS CodePipeline is a service to manage a build and deployment pipeline. CodePipeline itself is only responsible triggering integrations to do things like:Build.TestDeploy.We need a pipeline for each service that:Downloads the sources from CodeCommit if something changes there.Runs our test and bundles the code in a zip file for Lambda.Deploys the zip file.Luckily, CodePipeline has native support for downloading sources from CodeCommit. To run our tests, we will use a third-party integration to trigger Solano CI to run our tests and bundle the source files. The deployment step is implemented in a Lambda function that triggers a CloudFormation stack update. A CloudFormation stack is a bunch of AWS resources managed by CloudFormation based on a template that you provide (Infrastructure as Code). Read more about CloudFormation on our blog.The following figure shows the pipeline:The cool thing about CloudFormation is that you can define the pipeline itself in a template. So we get Pipeline as Code.The CloudFormation template that is used for service deployment describes a Lambda function, a DynamoDB database, and an API Gateway. After deployment you will see one CloudFormation stack for each service:We now have a CI & CD pipeline.ServiceWe use a bunch of AWS services to run our microservices.Amazon API GatewayAPI Gateway is a service that offers a configurable REST API as a service. You describe what should happen if a certain HTTP Method (GET, POST,PUT, DELETE, ...) is called on a certain HTTP Resource (e.g. /user). In our case, we want to execute a Lambda function if an HTTP request comes in. API Gateway also takes care of mapping input and output data between formats. The following figure shows how this looks like in the AWS Management Console for the Profile Service.The API Gateway is a fully managed service. You only pay for requests, no under/over provisioning, no boot time, no patching, no SSH, no load balancing. AWS takes care of all those aspects.Read more about API Gateway on our blogAWS LambdaTo run code in AWS Lambda you need to:use one of the supported runtimes (Node.js (JavaScript), Python, JVM (Java, Scala, ...).implement a predefined interface.The interface in abstract terms requires a function that takes an input parameter and returns void, something, or throws an error.We will use the Node.js runtime where a function implementation looks like this:exports.handler = function(event, context, cb) { console.log(JSON.stringify(event)); // TODO do something cb(null, {name: 'Michael'}); }; In Node.js, the function is not expected to return something. Instead, you need to call the callback function cb that is passed into the function as a parameter.The following figure shows how this looks like in the AWS Management Console for the profile service.AWS Lambda is a fully managed service. You only pay for function executions, no under/over provisioning, no boot time, no patching, no SSH, no load balancing. AWS takes care of all those aspects.Read more about Lambda on our blogAmazon DynamoDBDynamoDB is a Key-Value-Store or Document-Store. You can lookup values by their key. DynamoDB replicates across multiple Availability Zones (data centers) and is eventually consistent.The following figure shows how this looks like in the AWS Management Console for the authentication service.Amazon DynamoDB is a 99% managed service. The 1% that is up to you is that you need to provision read and write capacity. When your service makes more request than provisioned, you will see errors. So it is your job to monitor the consumed capacity to increase the provisioned capacity before you run out of capacity.Read more about DynamoDB on our blogRequest FlowThe three services work together in the following way:The user's HTTP request hits API Gateway. API Gateway checks if the request is valid — if so, it invokes the Lambda function. The function makes one or more requests to the database and executes some business logic. The result of the function is then transformed into an HTTP response by API Gateway.We now have an environment to run our microservices.Logs, Metrics, and AlertingA Blackbox is very hard to operate. That's why we need as much information from the inside of the system as possible. AWS CloudWatch is the right place to store and analyze this kind of information:Metrics (numbers).Logs (text).CloudWatch also lets you define alarms on metrics. The following figure demonstrated how the pieces work together.Operational insights that you get out-of-the-box:Lambda writes STDOUTand STDERR to CloudWatch logs.Lambda publishes metrics to CloudWatch about the number of invocations, runtime duration, the number of failures, etc.API Gateway publishes metrics about the number of requests, 4XX and 5XX Response Codes, etc.DynamoDB publishes metrics about consumed capacity, the number of requests, etc.The following figure shows a CloudWatch alarm that is triggered if the number of throttled read requests of the Location Service DynamoDB table is bigger or equal to one. This situation indicates that the provisioned capacity is not sufficient to serve the traffic.With all those metrics and alarms in place, we now can be confident that we receive an alert if our system is not working properly.SummaryYou can run a high-quality system on AWS by only using managed services. This approach frees you from many operational tasks that are not directly related to your service. Think of operating a monitoring system, a log index system, a database, virtual machines, etc. Instead, you can focus on operating and improving your service's code.The following figure shows the overall architecture of our system:Serverless or FaaS does not force you to use a specific framework. As long as you are fine with the interface (a function with input and output), you can do whatever you want inside your function to produce an output with the given input.

Read more

Solar plane lands in Spain after three-day Atlantic crossing

SEVILLE, Spain An airplane powered solely by the sun landed safely in Seville in Spain early on Thursday after an almost three-day flight across the Atlantic from New York in one of the longest legs of the first ever fuel-less flight around the world.The single-seat Solar Impulse 2 touched down shortly after 7.30 a.m. local time in Seville after leaving John F. Kennedy International Airport at about 2.30 a.m. EDT on June 20.The flight of just over 71 hours was the 15th leg of the round-the-world journey by the plane piloted in turns by Swiss aviators Bertrand Piccard and Andre Borschberg. "Oh-la-la, absolutely perfect," Piccard said after landing, thanking his engineering crew for their efforts. With a cruising speed of around 70 kilometers an hour (43 miles per hour), similar to an average car, the plane has more than 17,0000 solar cells built in to wings with a span bigger than that of a Boeing 747. (Reporting by Marcelo Pozo; Writing by Paul Day; Editing by Gopakumar Warrier)

Read more
Older Post