When exploring different rule base vendors, you probably encounter references to the RETE algorithm by Charles Forgy. This algorithm has proven itself to scale well for very large ruleset.
In all my years of experience, I’ve never encountered a client with more than 500 rules in a single rule policy. The reason is not any scalability issues on inference engines. The reason is simple. It is difficult for a business expert to

  • manage a very large rule policy
  • maintain a very large rule policy
  • verify and validate a large rule policy

As with any complexity, the divide and conquer strategy works very well. Split the large policy into smaller parts.
But for those who like to push the limits and see how well the BizTalk inference engine scales, you might like to read the article Microsoft’%u0092s Rule Engine Scalability Results – A comparison with Jess and Drools, by Charles Young.