Sunday, 20 September 2015

Casablanca C++ REST Framework - One Year Later.


Approximately one year ago I started a new project for one of my customers with the aim of adding a REST interface to their redesigned product. The new release should transform their as yet desktop-only and Windows-only application into a cross-platform, distributed, client-server one, using an HTTP API for communication.

The technologies to be used were Qt, C++11, Windows and Apple's OS X. The question was how to implement the REST interface.

The Setup

After having investigating some choices*, I settled for Microsoft's C++ REST SDK code-named Casablanca. It was cross-platform (sic!), used modern C++ constructs (C++11, you've known that I'd like that!) and was open source (sic!). Sounds like it wasn't Microsoft, but I think, we still need some time to get used to the new Microsoft.

There were some problems, though. The client choose Qt 5 framework for portability, and initially I was worried if Casablanca would play well with Qt's "I am the world" attitude, it's message pump and threading model. Moreover, the server implementation resides in an "experimental" namespace, which normally isn't a good sign either!

On the positive's side there was JSON support and nice asynchronous file transfer implementation based on the PPLX tasks (on Windows, for Linux, OS X, etc. Microsoft wrote a port). This was a big one, as the main functionality of the server will be processing files, and the input files will be mostly uploaded from other machines. And of course the biggest one - it's open source!

So the endeavor was not without risk! What can I say after a year give or take a couple of months after we started?

Highlights

One of the highlights is of course the task-based implementation of asynchronous processing used in Casablanca's API, like here, in the already mentioned, built-in support for file transfers:
CCasablancaFileTransfer::Task CCasablancaFileTransfer::StartFileDownload(const QString& downloadUrl, const QString& localFilepath) const
{
  using concurrency::streams::istream;
  using concurrency::streams::streambuf;
  using concurrency::streams::file_buffer;

  web::http::uri url(downloadUrl.ToStdString());
  web::http::client::http_client client(url);

  web::http::http_request getRequest(web::http::methods::GET);
  getRequest.headers().add(web::http::header_names::accept, "application/octet-stream"); 

  return client.request(getRequest)
    .then([=](pplx::task<web::http::http_response> previousTask)
  {
    try
    {
      auto response = previousTask.get();

      if (response.status_code() != web::http::status_codes::OK)
      {
        QString errTxt = ".....";
        return pplx::task_from_result(std::make_pair(false, errTxt));
      }

      try
      { 
        streambuf<uint8_t> localFile = file_buffer<uint8_t>::open(localFilepath.ToStdString()).get();

        return response.body().read_to_end(localFile)
          .then([=](pplx::task<size_t> previousTask)
        {
          streambuf<uint8_t>& nonconstFile = const_cast<streambuf<uint8_t>&>(localFile);
          nonconstFile.close().get();

          // ETag?
          QString maybeEtag = ftutil::FindHeader(response, web::http::header_names::etag);

          return pplx::task_from_result(std::make_pair(true, maybeEtag));
        });
      }
      catch (...)
      {
        return TranslateFileException(localFilepath);
      }
    }
    catch (...)
    {
      return TranslateWebException();
    }   
  });
}
Please notice that each block following a .then()will be executed asynchronously, in a separately scheduled thread (or task), when the preceding step will finish! You can do the same on the client side of course. Alternatively you can force blocking processing of a task by calling its get() method.

If you like comparisons, you may have a look at Facebook's Futures in fbthrift, They are generally working like Casablanca's tasks, but ave an additional nice onError() clause and even the possibility to choose a specific executor!

Note: I won't give here an introduction to Casablanca, the basic usage was explained several times on the Web (look here and here for basic client usage, here for basic server example, and here for file transfers). However, what I found is missing in all intro material I've seen, is the mention of exception propagation between asynchronous tasks. The problem is that a thrown exception has to be "observed" by library user, and if it won't be observed, Casablanca will "fail fast", i.e. take down the server in the destructor of the task that trew. Surprise, surprise, your server is crashing! An exception is observed (i.e. marked as such and then rethrown) if the .get() or wait() methods are called for that task or one of its continuations. So be cautious! The above code thus needs additional try-catch clause around the final .get() call, but I omitted it for sake of simplicity...

So it didn't take long, and I could announce:

Problems

1. The only really big problem that hit us was the performance. The trouble was that after a couple of thousands of file uploads or downloads server performance tumbled into a free fall: the same basic polling which normally had taken 2-3% of the CPU time surged to 20-30% after that! That was a real show stopper at first. And of course it wasn't our code that showed up in the profiler, it was some of Casablanca internals!

It took me about 2 weeks to investigate that, and maybe I'll write a more detailed post about it some time, but for now it suffices to say that Windows Concurrency Runtime (i.e. the native, task-based implementation of PPL) was left with an ever-growing internal list, which was sequentially scanned with each tick of the scheduler - at least in our environment of Windows 7 and Visual Studio 2013 plus an early Casablanca version (1.2.0).

I reported it to Microsoft, and they reacted pretty quickly - the next release has been given a define (CPPREST_FORCE_PPLX) to disable Concurrency Runtime and switch over to Windows Threadpool based implementation of PPL. I got the latest version from the development branch, tested it, and voila! our performance problems vanished into thin air. We then waited for the next release (2.5.0) and when it came out, we upgraded our code to it and suddenly everything worked like a charm. BTW, for the Casablanca version for Visual Studio 2015 Windows Threadpool is the default setting (or so I was told)**.

2. Other problem was lack of built-in CORS support, I had to implement t myself. Wasn't that difficult, but it's bound to our mixed Casablanca/Qt environment, so unfortunately I couldn't contribute it back to the project.

3. Then there were Qt-specific problems, first of them Qt's notorious usage of the /Zc:wchart_t- flag on Windows. This means that Qt uses for wide character not a native type (as the standard would require) but a typedef to unsigned short. Thus you won't be able to link Casablanca to your Qt-compliant project, because std::string will be mangled to $basic_string@GU by Qt (and your code) but to $basic_string@_WU by standard Casablanca build.

The remedy is to build Casablanca on your own (i.e. not to use the NuGet packet) with /Zc:wchart_t-. This will first fail with an error complaining about double definition for wchar_t, but then uncommenting the redundant definition will suffice. Hmm, when I think of this, I should probably contribute this back to Casablanca with some #ifdef combination...

4. Another Qt-related problem I stumbled upon was nothing more than a new type of deadlock (at least for me). I'll dub it the "big signal-mutex confusion deadlock". It's quite interesting though: imagine that your handler does one half of the work in a Casablanca thread but the second one in the main Qt-thread. Then use some lock to protect a resource from parallel access.

Now imagine, you lock the mutex in a Casablanca part of the GET handler and emit a signal to continue processing in Qt context. Now another handler (e.g. DELETE) didn't need to lock in the Casablanca context, already emitted a signal and is now in the Qt context trying to lock the mutex. The problem is, the second part of the GET handler will never execute, as the message pump is blocked with waiting for the mutex, which can be unlocked only by the next signal to come - deadlock. Mind-boggling? Well, threads are hard. Remedy: always lock the resource in Casablanca part of the handler, even if you don't need it.

Admittedly, the problem looks somehow artificial, but that's a consequence of the somehow inane requirement, that the requests should be finally processed in the Qt-context, a requirement originating from another part of the system, which I won't discuss or disparage here.

5. A minor problem was timestamp comparison using Casablanca's utility::datetime class:
int CBasicLastModifiedMap::CompareSec(const utility::datetime& lhs, const utility::datetime& rhs)
{
#if _DEBUG
 // TEST:::
  auto lhsStrg = lhs.to_string();
  auto rhsStrg = rhs.to_string();
#endif

  int timestampDiffSec = lhs - rhs; // truncates up to seconds!

  if (timestampDiffSec == 0)
    return 0;
  else 
    //extra check, because timestampDiffSec always > 0 (Casablanca problem!)
    if (lhs.to_interval() > rhs.to_interval()) 
      return 1;
  else
    return -1;
}
Bug or feature? - decide by yourself.

6. One (minor?) problem we encoundered was specifying all interfaces for the server to bind on. Normally you'd expect to be able to use "0.0.0.0", but in Casablanca rejects it. After consulting the source code the solution was clear: just use "*" instead!***

Conclusion

Otherwise: we are quite happy!!! The system works cross-platform, the performance is good, no apparent problems there.

Well, no problems till now. My client didn't specify any security for the first version of the product, they assumed (quite reasonably) that the customers will use the new product only inside of their trusted network at first. Neither is any notion of client identification or client roles planned, nor are we using gzipping of the HTTP data. Thus the more advanced features of a HTTP server weren't required.

As the next step towards more complicated scenarios, we'll look at HTTPS support in Casablanca. See you than...

Update:

OK, we actually found one weird bug while testing: "http_listener crashes when URI contains a pending square bracket" (bug report + proposed fix here). I had to patch it locally for the time being, but as it seems, it'll be fixed in version 2.6. Weird, other misconstructed URLs are rejected OK, only square brackets generate crashes. 
Update 2: same problem with square brackets in HTTP parameters, only I didn't resolved it yet (no time, a low prio bug). So story continues.
Update 3: The above problem resolved, see my code comments for explanation:
  // As of Casablanca 2.5 http_request::relative_uri() will throw exception if it encounters (correctly) encoded
  // "[", "]" or "#" characters in the query parameters!
  //  - workaround: try to extract the relative path by hand (message.absolute_uri().path() works)

Another 
Update: (07 Sep. 2016)

I said, we were overall happy with Casablanca, but there's a new problem, which probably could be pretty grave. Namely, Casablanca server tends to crash sometimes. In normal operation it's very seldom, but if there is a severe overload, it may happen rather more often :(. At first I thought another parts of code would have some dangling pointers, and put this error into a waiting room.

Recently I took some of my time to analyze it, and it seems to be a genuine Casablanca problem, which is somehow connected to handling of timed-out client connections (as it seems in the moment). In Release 2.8.0 there was a pull request that refactored this part of the code to remove some race conditions, but I'm not sure if this was sufficient to fix the crash...

As soon as I solved that, I'll blog. For the moment take heed, the server part is still in the "experimental" namespace (as of 2.8.0) and:
"The http_listener code is the least tested code in the library despite its age, so there are certainly a lot of issues here that need fixing."
--
* like: QtWebApp (by Stefan Frings), Qxt's WebModule, Tufao, QHttpServer, Pillow, nanogear, Apache's Axis2/C, Casablanca (of course!), cpp-netlib, gSoap WebServer, microhttpd (by GNU), libhttpsrver (based on microhttpd), Mongoose, libevhttp. They were (for the most part) fine, sometimes even great, but virtually none of them had C++11 or async file transfer support!

Update: Facebook's Wangle could be a good candidate too, it seems to have a decent async. support (see here), but.... : at that time I didn't find it (definitely a show-stopper ;). At the time of this writing it's got problems compiling on OS X, and I'm not sure if it compiles on Windows at all. There's simply no statement on the project's page what platforms are supported. AFAIK Thrift (or was it Folly?) seems to compile on Windows, but this compiling mess could be a problem.

** as it seems, Concurrency Runtime isn't used anymore in Microsoft's STL implementation (at least in OS's newer than XP) - http://blogs.msdn.com/b/vcblog/archive/2015/07/14/stl-fixes-in-vs-2015-part-2.aspx:
"Using ConcRT was a good idea at the time (2012), but it proved to be more trouble than it was worth.  Now we're using the Windows API directly, which has fixed many bugs."
and:
"... std::mutex on top of ConcRT was so slow!"  
*** I was recently asked by a reader of this post in an email about that problem, and I realized I forgot to mention that in the 1st writeup. Sorry!

13 comments:

Anonymous said...

Nice overview. Thank you for keeping us updated.

Kalle Strunz said...

Great Blog! Any new updates?

Marek Krj said...

@Kalle Strunz
Not yet, but I hope I'll be able to come round to solve the crash problem in the next 2 months. As for now I tested a fix which removes the crashes, but unfortunately introduces a memory leak. I couldn't proceed with that though - new features were more important for my client.

Pooja said...

I ran a simple listener with cpprestsdk 2.9.1 in VS 2015, CPU consumption is very high around 30%.

#include

#include
#include

using namespace web;
using namespace web::http;
using namespace web::http::experimental::listener;
using namespace utility;
using namespace std;

#define TRACE(msg) wcout << msg;
#define TRACE_ACTION(a, k, v) wcout << a << L" (" << k << L", " << v << L")\n";

map dictionary;



void handle_post(http_request request)
{
TRACE(L"\nhandle POST\n");

utility::string_t input;
input = request.to_string();
utility::ofstream_t out("output.txt");
out << input;
out.close();

}

int main(int argc, char** argv)
{

http_listener listener(L"http://localhost:3277");
listener.support(methods::POST, handle_post);


try
{
listener
.open()
.then([&listener]() {TRACE(L"\nstarting to listen\n"); })
.wait();

while (true);
}
catch (exception const & e)
{
wcout << e.what() << endl;
}
}

how can it be modified to perform well?

I tried adding CPPREST_FORCE_PPLX, no change in CPU consumption.

Pooja said...

Hi Marek,

I need your support to resolve high cpu usage issue of cpprestsdk (casablanca).

I wrote a http listener using cpprestsdk 2.9

class Listener
{
public:
Listener() {}
Listener(utility::string_t url);

pplx::task open() { return m_listener.open(); }
pplx::task close() { return m_listener.close(); }
private:
void handle_get_or_post(http_request message);

http_listener m_listener;
};

Listener::Listener(utility::string_t url) : m_listener(url)
{
m_listener.support(methods::GET, std::bind(&Listener::handle_get_or_post, this, std::placeholders::_1));
m_listener.support(methods::POST, std::bind(&Listener::handle_get_or_post, this, std::placeholders::_1));
}

void Listener::handle_get_or_post(http_request message)
{
message.reply(status_codes::OK, "ACCEPTED");
};

And when i run this program, cpu usage for this listerner is around 30%

How it can be minimized? Why cpu usage is so high for cpprestsdk?

Regards,
Pooja

Marek Krj said...

@Pooja
As it looks this is a pretty standard usage of Casablanca - I'm doing it in that exact manner too. However I'm still on 2.6, so maybe there are some new problems here.

Are you on Windows or on Mac/Linux? CPPREST_FORCE_PPLX only makes sense on Windows, and as of VisualStudio 2015 is is (or it should be, I didn't check) the standard setting. However, the problems with that setup would only manifest themselves after 1000 or so files are uploaded or downloaded.

You probably have to use profiler... 30% CPU is measures during listeners idle time? DM me on Twitter!

Pooja said...

Hi Marek

I need to build a http listerer using VS2012 on windows, I am using VS2015 just for trial.
Could you please suggest me which casablanca build should i use for this configuration?

Regards,
Pooja

Marek Krj said...

@Pooja
I'm using VS 2013, Casablanca version 2.6. I don't know if there's a Casablanca version supporting VS 2012, as they have (had?) a policy of supporting only the recent 2(?) VS versions.

I think, you should at any case use at least CS version 2.6, (+CPPREST_FORCE_PPLX) because of the performance problems I described in the blogpost - the Windows Task liblary simply doesn't cut it!

Marek Krj said...

Pooja's problem resolved - Casablanca wasn't the culprit, blocking the main thread was. Consult your preferred Casablanca tutorial on how to use listeners.

Thanesh Gopal said...

Hi Mark,
Thank you for the article. I wanted to use Casablanca for my project and was investigating notification mechanism. It appears Casablanca has an implementation of WebSocket client, but not a server. Have you been able to implement a notification mechanism to send events to client apps? I wasn't able to find any documentation/tutorials on how to do this. Any help would be much appreciated. Thank you.
-Thanesh

Marek Krj said...

Hi Ganesh,
I didn't implement WebSocket support in the serve, as my client didn't opt for it, but I had a look at possible implementation.

Unfortunately I can only very dimly recall that I thought "OK, it's possible at the server side too". Basically a 3rd part library was bundled with Casablanca 2.5, quoting from its description:

"WebSocket++ is a header only C++ library that implements RFC6455 The WebSocket
Protocol. It allows integrating WebSocket client and server functionality into
C++ programs. It uses interchangeable network transport modules including one
based on C++ iostreams and one based on Boost Asio"

As it seems it could be possible to integrate it into a Casablanca server, but only on Linux/Mac. Alternatively, you'd have to change the underlying infrastructure from PPL to Asio on Windows.

Never tried it though, and I don't know if there's any new deveolpment in curret Casablanca code - I was busy with other things.

Marek Krj said...

@Ganesh
Wait a minute - "It uses interchangeable network transport modules including one
based on C++ iostreams"!!! This in principle it should be possible to integrate with the Windows implementation as it defines its transport layer as iostreams... At least it would be worth to look at it.

I should probably do it and write a blogpost about it, but at the moment it is not possible :(, sorry.

Dillon said...

@Pooja
it's standard if you have while(true) in your code,it's also useless!!

Delete it