Tuesday, 29 March 2016

HTTPS Support for a Casablanca Client


This post is a continuation of a previous one about SSL and Casablanca C++ REST-API library by Microsoft. There we added SSL support to a Casablanca server. This time we'll do it on the client side. At the first sight it seems to be simple, just write:
  web::http::client::http_client restClient(U("https://xxx.yyy.zzz"));
  auto response = restClient.request(web::http::methods::GET);
and go on! As we saw, the only thing to do is to use a HTTPS URI. Nothing more to do? Not really! The main problem when adding SSL support to the client occurred to be the error handling. More specifically, I wanted to differentiate between an SSL error and a simple no-connection error.

Casablanca uses the new standard std::error_condition/error_category classes for that. The problem is, these classes are yet poorly understood by programmers (at least by me, because it is so new)*, and, which is much worse, that their usage by Casablanca (as of v.2.5.0) isn't consistent between Windows and Linux code!

On Windows, there is a special windows_category defined for system errors, while for Linux the standard std::system_category is used. Worse, the error codes forwarded there come from Boost ASIO, which in its turn just forwards OpenSSL's error codes. Which again depends on the library version :-/.

As to tame this chaos a little, the following little helper class was born:
class CCasablancaClientErrorCode 
{
public:
  CCasablancaClientErrorCode(int errCode) : _errCode(errCode) {};
 
  /**
    Check and translate to POSIX-conforming system error code.

    For HTTP connection problems following codes are used by Casablanca:
        std::errc::host_unreachable, std::errc::timed_out, std::errc::connection_aborted
  */
  bool IsStdSystemError(std::errc& stdErrorCode) const
  {
#ifdef _WINDOWS
    const std::error_condition ec = utility::details::windows_category().default_error_condition(_errCode);
#else  
    const std::error_condition ec = std::system_category().default_error_condition(_errCode);
#endif
    if (ec.category().name() != genericCategoryStrg) 
    {
      return false;
    }
    else
    {
      stdErrorCode = std::errc(ec.value());
      return true;
     }
  }
 
  /**
    Check if the reported error comes from SSL.

    There is only one, generic server SLL certificate error at the moment!
  */ 
  bool IsSslError() const
  {  
#ifdef _WINDOWS
    const std::error_condition ec = utility::details::windows_category().default_error_condition(_errCode);

    if (ec.category() == utility::details::windows_category())
    {
      return _errCode == ERROR_WINHTTP_SECURE_FAILURE;
    }
    else
    {
      return false;
    }
#else
    const std::error_condition ec = std::system_category().default_error_condition(_errCode);

    if (ec.category().name() == genericCategoryStrg)
    {
      return false;
    }
    else
    {
      // ??? OPEN TODO:::: must test, but it will depend on the SSL version !!!
      // - patch Casablanca's ASIO code?
      // return _errCode == 336458004; //== 0x140DF114  // OpenSSL error code  ??OR?? 335544539 == 0x140000DB

      return true; // should be SSL 
    }
#endif
  }

private:
  int _errCode;
};
Note that on Windows, only a single generic SSL error code is supported by Casablanca 2.5. If we wanted to differentiate between SSL error types, we'd have to patch Casablanca code (install a special callback handler to report the context of the HTTP error). I didn't make it, as generic SSL error was good enough for my customer.

Now now we can use this helper class like that:
  CCasablancaClientErrorCode clientErrCode = e.error_code().value();
  std::errc sysErrCode;

  if (clientErrCode.IsStdSystemError(sysErrCode))
  {
    switch (sysErrCode)
    {
    case std::errc::host_unreachable:
      // no connection error!
      break;
    default:
      // other error (std::errc::timed_out, std::errc::connection_aborted)
    }
  }
  else if (clientErrCode.IsSslError())
  {
    // SSL error!
  }
Having done that, the HTTPS support on the Client side was ready.

--
* I won't explain the workings of std::error_category here, for the confused readers this was already done here (parts 1 to 5, I said it's not a piece of cake).

Empirical Results on the Static vs. Dynamic Debate?


You may know that I'm quite interested in the "Static vs. Dynamic Typing" controversy, and already even clearly declared my allegiance basing my decision on some pretty sloppy reasoning (if not largely on the gut feeling...).

But what about some hard facts, numbers, measurements and controlled experiments to decide the debate? Like the Leibnizian* "Calculemus!". Well, some good soul (@danluu) took the burden of this task and looked** at the best known papers, talks  and comparison studies. To what avail?

First there's a warning:
"If you look at the internet commentary on these studies, most of them are passed around to justify one viewpoint or another. The Prechelt study on dynamic vs. static, along with the follow-ups on Lisp are perennial favorites of dynamic language advocates, and github mining study has recently become trendy among functional programmers."
So apparently, there's little interest to find out the truth, only to defend convenient positions. But we are fearless seekers for the truth, and want to know what the results are. Unfortunately the classic studies, when examined in more detail reveal some serious methodological flaws, and the author could only conclude that:
"Other than cherry picking studies to confirm a long-held position, the most common response I’ve heard to these sorts of studies is that the effect isn’t quantifiable by a controlled experiment."
OK, pretty sobering, we still know nothing, it's all lies, damn lies and/or lacking statistical knowledge. But how come the theoretically superior static typing, and that by a simple argument - how can be more support from the compiler not be better than less support - do not bring any measurable improvements?

Lets reiterate the position taken by supporters of dynamic typing in words taken from a  talk at JavaZone:
If these academic computer scientists would get out more, they would soon discover an increasing incidence of software developed in languages such a Python, Ruby and Clojure which use dynamic, albeit strong, type systems. They would probably be surprised to find that much of this software—in spite of their well-founded type-theoretic hubris—actually works, and is indeed reliable out of all proportion to their expectations.
So it seems to work in real life, and the experimental studies cannot prove the contrary. But it also looks like someone is trying hard to ignore something, don't you think so?

But then I discovered a recent paper which proposed an new angle of attack!

While the old studies tried to measure the performance of programmers (and somehow failed to design a meaningful experiment) the authors of the new one pursued a more modest goal. They introduced random changes to program's code and then tried to compile and run it. In their words:
"Our study is based on a diverse corpus of programs written in several programming languages systematically perturbed using a mutation-based fuzz generator.
....
More importantly, our study also demonstrates the potential of comparative language fuzz testing for evaluating programming language designs."
We thus could have a possibility to compare programming languages in some way? Sounds promising! So what are the results?
"The results we obtained prove that languages with weak type systems are significantly likelier than languages that enforce strong typing to let fuzzed programs compile and run, and, in the end, produce erroneous results."
The exact results are maybe interesting***:

Ruby
   - compile: 54%
   - run       :  31%
 Python
   - compile: 43%
   - run       :  25% 
PHP
: (that poster child of bad language design!)
   - compile: 48%
   - run       :  41% 
Javascript

   - compile: 49%
   - run       :  23% 
Java

   - compile: 18%
   - run       :  15% 
Haskell

   - compile: 19,5%
   - run       :  18% 
C++

   - compile: 20,5%
   - run       :  12%

See, On the average, dynamically typed programs are twice as likely to be compiling/running wioth nonsensical source code. That's definitely SOME improvement, isn't it?

On the other side: they are testing programming languages' resiliency against typos - but wasn't that the exact reason for introduction of types in programming languages? So we know that typed languages are good in what they were invented for. But we knew that already.

Update: This blogpost sets up a different thesis: the difference between statically and dynamically typed languages isn't that important. Rather than that, the simplicity of the language design is decisive in order to avoid bugs! IMHO this argument is misguided, because what I wanted to know is, given two languages of "same complexity" (problems, problems, hot to measure that?) which is a better one - the one with or without static typing?

--
*  Nice little fact - Leibniz's program for mathematization of human knowledge was formulated in the following words:
"Actually, when controversies arise, the necessity of disputation between two philosophers would not be bigger than that between two computists. It would be enough for them to take the quills in their hands, to sit down at their abaci, and to say  (as if inviting each other in a friendly manner): Let's calculate! (Calculemus!)"
Unfortunately, this cannot be done (at least according to Turing) and flame wars still prevail.

 ** http://danluu.com/empirical-pl/ - "Static vs. Dynamic Languages: A Literature Review"

*** Caution: they are not exact, I read it out of a diagram!