Posted tagged ‘s3’

HTTP Clients for Squeak

May 25, 2009

Cloudfork-AWS makes the Amazon Web Services (AWS) S3, SQS and SimpleDB easily accessible from Smalltalk. All the communication between the Smalltalk image and AWS is done via HTTP. So a HTTP Client is an important requirement for Cloudfork-AWS.

Cloudfork-AWS needs more than just handling simple HTTP GET and POST requests, the following features are also needed:

  • Setting custom request headers – S3 uses custom headers for authentication and for attaching meta-data to S3 objects. We need to be able to set these headers. This feature is also required for range requests, with these requests you can download a part of a S3 object instead of downloading the entire object.
  • Access to the response headers – So we can read the meta-data of S3 objects.
  • Support for PUT, HEAD and DELETE requests – Also required for S3. PUT is required for storing objects and creating buckets. DELETE is required for removing objects and HEAD for getting the object meta-data without downloading the object itself.
  • HTTPS support – The AWS services can be accessed via plain HTTP or via secure HTTPS. The choice is up to the client. But the release notes of the latest releases of SimpleDB mention that HTTP support will be deprecated and that future versions will require HTTPS.
  • HTTP/1.1 support – Not a must have feature but version 1.1 requests can be more efficient than version 1.0 requests because of the keep-alive feature of version 1.1. With this feature socket connections can be reused between requests.
  • Streaming uploads and downloads – Also not a must have feature for most use cases. Only when large s3 objects need to be handled.
  • Proxy support – Not a requirement of one of the AWS services but a feature that is often required by client configurations.

Note that most of these features are only required for S3 and not for SQS and SimpleDB. SQS and SimpleDB only use GET and POST requests and the authentication is done in the URL and not through HTTP header fields. The HTTP responses of SQS and SimpleDB always contain XML and the maximum size is about 8KB for SQS and 1MB for a SimpleDB resultset so streaming support is not required.

As far as I know there are three HTTP clients available for Squeak:

  • The HTTPSocket class – This class is part of the Network-Protocols package and is part of the standard images of the latest Squeak and Pharo versions.
  • SWHTTPClient – This is an extensive HTTP client library. It was originally developed for Dolphin Smalltalk and was ported to Squeak. The latest release is not fully compatible with the latest Squeak release. There are a number of class extension collisions.
  • CurlPlugin – This is a Squeak plugin that uses the libcurl C library, libcurl is a well-known and powerful open source “URL transfer library” with support for HTTP, FTP and many other protocols.

HTTPSocket

This is a very simple implementation of a HTTP client in a single class. HTTP GET and POST requests are supported, access to the headers is also possible and simple proxy configurations are also supported. HTTP version 1.1 is not supported, HTTPS is also not possible.

The current version of Cloudfork-AWS does not work with HTTPSocket as a HTTP client. With the provided functionality it should be possible to support the SQS and SimpleDB API’s. But when I use HTTPSocket I get an AWS error telling me that the signature calculated is wrong. I think this is because HTTPSocket always adds the port number to the host header field. Cloudfork doesn’t do this when it calculates the signature so you get a mismatch. It is on my todo list to fix this.

SWHTTPClient

SWHTTPClient is a full featured HTTP client library. It supports HTTP/1.1, access to the header fields and the PUT, HEAD and DELETE methods. Streaming uploads and downloads are also possible. The one thing that is not supported or that I couldn’t get working is HTTPS. Perhaps it’s possible to get this working by plugging in the Cryptography package but I have no idea how.

Another issue is that SWHTTPClient is not fully compatible with the latest Squeak and Pharo releases. The package contains some class extensions that override exiting methods with different behavior. For example the String>>subStrings: method.

Cloudfork-AWS can use SWHTTPClient, all AWS features work except HTTPS. I have fixed all the incompatibilities I bumped into. The patched version of SWHTTPClient is available from the Cloudfork project page on SqueakSource.

CurlPlugin

The installation of this library is a bit more work. You need the place the correct binaries for your platform in the Squeak installation directory and load the CurlPlugin package from SqueakSource. If you load the package you may get a warning that the class CurlPlugin cannot be loaded. This is no problem, you can still use the plugin through the Curl class. The CurlPlugin class is only needed if you want to create a new version of the plugin or support a new platform.

The libcurl library that the CurlPlugin uses supports all the HTTP features we need and many more. It is one of the bests HTTP client libraries around. And it’s open source. It has an optional integration with openssl which provides the functions required for HTTPS.

The current version of the CurlPlugin doesn’t expose all the features of libcurl. Currently HEAD and DELETE requests are not supported. It is also not yet possible to set the header fields for a requests. The other methods work very well and HTTPS also works fine.

Cloudfork-AWS

For the SimpleDB and SQS services the CurlPlugin is the best HTTP client. All the required features are there and the performance is very good. SimpleDB and SQS also work with the SWHTTPClient, only without HTTPS support. If the Curl class is present in your image Cloudfork-AWS will use this class for all SimpleDB and SQS service calls, otherwise the SWHTTPClient is used.

The current CurlPlugin doesn’t support all the features required by the S3 service. For this reason the Cloudfork S3 functionality requires the SWHTTPClient.

Future work

I think the CurlPlugin has the potential to become a very good HTTP client library for Squeak and Pharo. It will also be relatively easy to maintain this library because all of the complex work of supporting the different protocols is implemented in libcurl. This C library has a very large community and is well maintained. I will try to extend the plugin and add the missing features.

I will also try to make Cloudfork-AWS compatible with HTTPSocket. This will not be the best performing solution but it can be an easy starting point.

Problems with Daylight saving time in VA Smalltalk

March 29, 2009

All the requests that Cloudfork-AWS sends to the Amazon web services contain the current date and time in Coordinated Universal Time (UTC). If this timestamp differs more than a few seconds from the current time you get an error. For example the S3 error: RequestTimeTooSkewed – The difference between the request time and the current time is too large. The reason for this time check is security, it prevents “record en playback” attacks.

So systems that make use of AWS must have the correct time and also the timezone must be correct. Otherwise the conversion to UTC will give the wrong result. A few days ago this all worked perfectly in VA Smalltalk, but tonight all AWS calls fail :-( Last night we in The Netherlands switched to Daylight saving time (DST). VA Smalltalk doesn’t seem to handle this very well. A call to “DateAndTime now” still returns an offset from UTC as one hour instead of two. It seems that this is a known problem.

Until this problem is fixed we have to use a less than elegant solution to get things working again. We have added a “DSTMode” flag, when this flag is true we subtract an extra hour when converting to UTC. You can enable this mode by executing:


CFPlatformServiceVASTUtils enableDSTMode: true

Support for the DSTMode was built into Cloudfork version jvds.79.

VA Smalltalk version of Cloudfork is ready for use

March 27, 2009

All the functionality of Cloudfork-AWS is now also available for VA Smalltalk. With Cloudfork-AWS you can access the Amazon S3, SQS and SimpleDB services from a simple to use Smalltalk interface. The code is hosted at VAStGoodies.com, the SourceForge for VA related projects.

All tests are green!

All tests are green!

As you can see all tests pass. Porting from one Smalltalk dialect to another is a tedious job, there are a lot of little differences you have to take care of. For example the asSortedCollection is case insensitive in VA Smalltalk and is case sensitive in Squeak/Pharo. Because of this the AWS signatures were calculated wrong in VA. Also the functionality for parsing xml and using http are completely different. We have isolated all this dialect specific stuff in a separate package/application.

For installation instructions and for reporting issues you can use our project page on Google code: http://code.google.com/p/cloudfork/wiki/InstallingForVASmalltalk

Uploading to an Amazon S3 Bucket from Seaside

February 18, 2009

The Cloudfork repository on Squeaksource contains a package named Cloudfork-S3Upload. This package contains a Seaside component that shows how you can let a web user upload data directly to a S3 bucket. The Seaside component uses the AWS POST feature to implement this.

An advantage of this approach is that the data goes directly to S3 and doesn’t pass through your server. This increases the scalability and robustness of the system. Especially when uploading large files like media files.

The Cloudfork-S3Upload package doesn’t depend on any other Cloudfork package. It depends on the Seaside web framework and also on the Cryptography Team Package (required for the SHA1 hash function). I used the alpha version of Seaside 2.9 to develop this package.

My ambition is to develop a reusable S3 upload component that uses AJAX functionality to start the upload process without submitting the complete page. In this way it will be possible to perform multiple uploads simultaneously from a single page. This is not trivial and with my limited AJAX knowledge it will take me some time to get this working. The current code (version jvds.3) contains the functionality to configure the policy and to make a signature of this policy. The Seaside component CFS3UploadExample1 shows how this code can be used to implement an upload form. This sample form doesn’t have any fancy AJAX functionality but it does work without problems.

Getting started with S3 using Cloudfork

January 9, 2009

Cloudfork provides access to the REST API of the Amazon S3 Web Service that can be used to store and retrieve any amount of data, at any time, from anywhere on the web.

Two basic concepts of S3 are buckets and objects. Buckets are essentially named containers of S3 objects. An S3 Object refers to data and has a key. To some extent, these concepts are similar to folders and files in a filesystem. Using the S3 service, you can create buckets, put objects in them and get each object using its key. In addition, with S3 you can control the access (public,private) of buckets and objects. This subject will be discussed in another post.

Before using any of the Amazon Web Services, you should have an AWS account and you should sign up for the S3 storage service. Once you have created that account, you can create a Cloudfork AWSCredentials object which is needed for all communication with AWS.

| awsCredentials s3 response bucket result |
awsCredentials := CFAWSCredentials
    newWith: '<your access key id>'
    andSecret: '<your secret access key>'.

Create a Bucket

Use CFSimpleStorageService to create and delete buckets. Each API message in Cloudfork returns a response object, for this service that will be a CFS3Response. Every response has a result and, in case of an error, detailed information about the reason for failure. You should always inspect the response whether it was successful.

s3 := CFSimpleStorageService newWith: awsCredentials.
response := s3 createBucketNamed: '< your new bucket name >' .
response isError
    ifTrue:[self error: 'creation failed because:',
      response errorMessageText].

Put an Object

You need a CFBucket object to access objects. Use the service (temporal variable s3) to open one. Now you have a reference (temporal variable bucket) to a service object that can store and retrieve objects in your bucket. Any data (from 1 byte up to 5 GB) that you put in your bucket must have a key (String). Keys must be unique within a bucket.

bucket := s3 openBucketNamed: '< your existing bucket name >'.
response := bucket
    putObject: 'Small in the Cloud'
    as: 'someKey'.
response isError
    ifTrue:[self error: 'storing the object failed because:',
      response errorMessageText].

You can also use the Dictionary method at:put: for accessing the bucket. The example above stores a String object in a bucket but any collection of bytes can be stored. For instance, you can also pass in a FileStream as the “put” argument.

Put an Image from File

response := bucket
    putObjectAs: 'picture'
    readFrom: (FileStream fileNamed: 'picture.jpg')
    withHeaders: ((Dictionary new) at: 'Content-Type' put: 'image/jpeg' ;yourself).
response isError
    ifTrue:[self error: 'storing the image object failed because:',
      response errorMessageText].

In a follow up post, we will discuss best practices in choosing keys for objects. One of them is using the “/” separator character to simulate a filesystem.

The examples below are straightforward. We invite you to write an example that lists all object keys of one of your buckets.

Get an Object

response := bucket
    getObject: 'someKey'
    writeOn: (result := WriteStream on: ByteArray new).
response isError
    ifTrue:[self error: 'access to the object failed because:',
      response errorMessageText].

Delete an Object

response := bucket deleteObject: 'someKey'.
response isError
    ifTrue:[self error: 'object removal failed because:',
      response errorMessageText].

Note: removeKey: can also be used which does not return a CFS3Response.
Delete Bucket

response := s3 deleteBucketNamed: '< your existing bucket name >' .
response isError
    ifTrue:[self error: 'delete bucket failed because:',
      response errorMessageText].

This post showed you the basic usage of the Cloudfork S3 services. Browse the classes in this package to see how other S3 API services are mapped to Smalltalk messages. Future posts will discuss these in more detail.

Cloudfork AWS packages and some rough edges

January 5, 2009

Update: Installation instructions are described and maintained on Cloudfork @ googlecode

If you want to get started with Cloudfork AWS you need to know what packages to load and what the prerequisites are. The current implementation also has some rough edges that you need to be aware of. As mentioned on SqueakSource the prerequisite packages are:

  • Cryptopraphy Team Package
  • HTTPClient version

Both packages can be installed using the SqueakMap Package Loader. The Cryptography package is needed for the HMAC-SHA1 or HMAC-SHA256 signature functions. Amazon won’t accept your requests without them. The HTTPClient is needed for handling the HTTP REST calls. Both prerequisites can be loaded using the Universe Browser.

Packages

For Cloudfork AWS you need to load three packages using Monticello:

Cloudfork-AWS: The main package, this package contains most of the code. Only the Smalltalk dialect specific functionality has been factored out.

Cloudfork-Squeak-Platform: Package with functionality that we think is Squeak specific. This is mostly done by subclassing base classes from Cloudfork-AWS.

Cloudfork-Tests-AWS: This optional package contains unit and integration tests. They can be used to verify that everything works correctly and especially the integration tests give a good example on how the API’s can be used.

We have used the Seaside 2.9 project as a good example on how to setup the package structure. Also note that before you can actually run the integration tests you need an AWS account that is enabled for the specific service you want to test (SimpleDB, SQS or S3). You can set your AWS account credentials using the class method defaultAwsKey: awsKey andSecret: aSecret of the class CFAWSIntegrationTest.

Rough edges

Although the package are already usable there are still some rough edges you need to be aware of:

  • Proxy support – If you want to use Cloudfork AWS from behind a proxy you need to define this somewhere. We haven’t yet made a public method to configure this. For now you will need to change the initialize method of CFAWSRESTSqueakAccess. The comment contains a sample proxy definition.
  • HTTPS – All the AWS services with a REST based interface can be used with plain HTTP and with secure HTTP (HTTPS). Until now we have only tested with plain HTTP.
  • UTF-8 – Alls HTTP calls to AWS should be encoded using UTF-8 Currently this works for the standard characters . We have not done any testing with special / non-western characters.

Introducing Cloudfork AWS

January 4, 2009

Cloudfork AWS is a new open source project that provides easy access from Smalltalk to the Amazon Web Services that are related to cloud computing. We started development in December on the following services:

  • SimpleDB – “a web service providing the core database functions of data indexing and querying”
  • Simple Queueing Service (SQS) – “offers a reliable, highly scalable, hosted queue for storing messages as they travel between computers”
  • Simple Storage Service (S3) – “provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web”

The complete SimpleDB and SQS API’s are already available from Smalltalk. We are still working on completing the S3 API.

We use Squeak as our Smalltalk development platform and host the code on Squeaksource. The plan is to port Cloudfork AWS to other Smalltalk dialects as soon as the code is reasonably stable. The goal of the Cloudfork AWS project is simply to make the API’s easily usable from Smalltalk. We plan to start a number of other projects that use these API’s to provide even more interesting functionality. Keep watching this blog for more information.


Follow

Get every new post delivered to your Inbox.