Posted tagged ‘simpledb’

Testing Cloudfork AWS SimpleDB based classes

September 20, 2009

The Cloudfork framework includes an alternate implementation of CFSimpleBase that stores all items in memory. The CFSimpleDBEmulator was initially created to support the unit testing of the ActiveItem framework. With the exception of some query constructs, it implements the complete api and therefore is suitable for unit testing your own applications as well.

| emulator domain item |
emulator := CFSimpleDBEmulator new.
domain := (emulator createDomain: 'myapp.development') result. 
item := CFSimpleDBItem newNamed: 'user.dennis'.
item valueAt: 'birthday' put: '20060916'  ;valueAt: 'hobby' put: 'soccer'.
domain putItem: item.

After running the tests, you can inspect the emulator to see what items have been stored in which domains and what attributes they have. If you store the emulator in some class var then you can keep the data around for development too.

ActiveItem
Because the ActiveItem framework is build on top of SimpleDB, the same emulator class can be used to unit test those applications. ActiveItem uses a globally shared CFSimpleDB instance so you only need to replace that with an emulated instance.

CFActiveItem activateWithSimpleDB: CFSimpleDBEmulator new.

Using the CFActiveItemSerializer you can even dump the items to a local Filesystem for development convenience.

HTTP Clients for Squeak

May 25, 2009

Cloudfork-AWS makes the Amazon Web Services (AWS) S3, SQS and SimpleDB easily accessible from Smalltalk. All the communication between the Smalltalk image and AWS is done via HTTP. So a HTTP Client is an important requirement for Cloudfork-AWS.

Cloudfork-AWS needs more than just handling simple HTTP GET and POST requests, the following features are also needed:

  • Setting custom request headers – S3 uses custom headers for authentication and for attaching meta-data to S3 objects. We need to be able to set these headers. This feature is also required for range requests, with these requests you can download a part of a S3 object instead of downloading the entire object.
  • Access to the response headers – So we can read the meta-data of S3 objects.
  • Support for PUT, HEAD and DELETE requests – Also required for S3. PUT is required for storing objects and creating buckets. DELETE is required for removing objects and HEAD for getting the object meta-data without downloading the object itself.
  • HTTPS support – The AWS services can be accessed via plain HTTP or via secure HTTPS. The choice is up to the client. But the release notes of the latest releases of SimpleDB mention that HTTP support will be deprecated and that future versions will require HTTPS.
  • HTTP/1.1 support – Not a must have feature but version 1.1 requests can be more efficient than version 1.0 requests because of the keep-alive feature of version 1.1. With this feature socket connections can be reused between requests.
  • Streaming uploads and downloads – Also not a must have feature for most use cases. Only when large s3 objects need to be handled.
  • Proxy support – Not a requirement of one of the AWS services but a feature that is often required by client configurations.

Note that most of these features are only required for S3 and not for SQS and SimpleDB. SQS and SimpleDB only use GET and POST requests and the authentication is done in the URL and not through HTTP header fields. The HTTP responses of SQS and SimpleDB always contain XML and the maximum size is about 8KB for SQS and 1MB for a SimpleDB resultset so streaming support is not required.

As far as I know there are three HTTP clients available for Squeak:

  • The HTTPSocket class – This class is part of the Network-Protocols package and is part of the standard images of the latest Squeak and Pharo versions.
  • SWHTTPClient – This is an extensive HTTP client library. It was originally developed for Dolphin Smalltalk and was ported to Squeak. The latest release is not fully compatible with the latest Squeak release. There are a number of class extension collisions.
  • CurlPlugin – This is a Squeak plugin that uses the libcurl C library, libcurl is a well-known and powerful open source “URL transfer library” with support for HTTP, FTP and many other protocols.

HTTPSocket

This is a very simple implementation of a HTTP client in a single class. HTTP GET and POST requests are supported, access to the headers is also possible and simple proxy configurations are also supported. HTTP version 1.1 is not supported, HTTPS is also not possible.

The current version of Cloudfork-AWS does not work with HTTPSocket as a HTTP client. With the provided functionality it should be possible to support the SQS and SimpleDB API’s. But when I use HTTPSocket I get an AWS error telling me that the signature calculated is wrong. I think this is because HTTPSocket always adds the port number to the host header field. Cloudfork doesn’t do this when it calculates the signature so you get a mismatch. It is on my todo list to fix this.

SWHTTPClient

SWHTTPClient is a full featured HTTP client library. It supports HTTP/1.1, access to the header fields and the PUT, HEAD and DELETE methods. Streaming uploads and downloads are also possible. The one thing that is not supported or that I couldn’t get working is HTTPS. Perhaps it’s possible to get this working by plugging in the Cryptography package but I have no idea how.

Another issue is that SWHTTPClient is not fully compatible with the latest Squeak and Pharo releases. The package contains some class extensions that override exiting methods with different behavior. For example the String>>subStrings: method.

Cloudfork-AWS can use SWHTTPClient, all AWS features work except HTTPS. I have fixed all the incompatibilities I bumped into. The patched version of SWHTTPClient is available from the Cloudfork project page on SqueakSource.

CurlPlugin

The installation of this library is a bit more work. You need the place the correct binaries for your platform in the Squeak installation directory and load the CurlPlugin package from SqueakSource. If you load the package you may get a warning that the class CurlPlugin cannot be loaded. This is no problem, you can still use the plugin through the Curl class. The CurlPlugin class is only needed if you want to create a new version of the plugin or support a new platform.

The libcurl library that the CurlPlugin uses supports all the HTTP features we need and many more. It is one of the bests HTTP client libraries around. And it’s open source. It has an optional integration with openssl which provides the functions required for HTTPS.

The current version of the CurlPlugin doesn’t expose all the features of libcurl. Currently HEAD and DELETE requests are not supported. It is also not yet possible to set the header fields for a requests. The other methods work very well and HTTPS also works fine.

Cloudfork-AWS

For the SimpleDB and SQS services the CurlPlugin is the best HTTP client. All the required features are there and the performance is very good. SimpleDB and SQS also work with the SWHTTPClient, only without HTTPS support. If the Curl class is present in your image Cloudfork-AWS will use this class for all SimpleDB and SQS service calls, otherwise the SWHTTPClient is used.

The current CurlPlugin doesn’t support all the features required by the S3 service. For this reason the Cloudfork S3 functionality requires the SWHTTPClient.

Future work

I think the CurlPlugin has the potential to become a very good HTTP client library for Squeak and Pharo. It will also be relatively easy to maintain this library because all of the complex work of supporting the different protocols is implemented in libcurl. This C library has a very large community and is well maintained. I will try to extend the plugin and add the missing features.

I will also try to make Cloudfork-AWS compatible with HTTPSocket. This will not be the best performing solution but it can be an easy starting point.

Composition relations in Cloudfork-ActiveItem

April 20, 2009

In UML, the composition relation between objects is a special association that is used to model a “private-container” relationship. The typical class-room example is the Car object having 4 Wheel objects. Although you can replace wheels on a car, one particular Wheel object is never shared with other Car objects.

In Amazon SimpleDB there is no concept of relations ; it is a simple storage of items having attributes (key-value pairs). The Cloudfork-ActiveItem framework can map these relations to foreignkey-like attributes but that should be used with care. Because SimpleDB is not a relational database, operations such as Joins are simply not possible. However, mapping the composition relation fits much better in the SimpleDB storage model. The notion of a SimpleDB item being a container of information is just what it is meant to be.

To illustrate how ActiveItem supports this design construct, I will give an example that models multiple-choice questions for an exam training application. A Question is a composition of 4 Choices ; one of them is the correct answer to that question.

Question class>>describe: aQuestion

  aQuestion
    hasString: #code ;
    hasText: #text ;
    ownsMany: #choices
Choice class>>describe: aChoice

  aChoice
    hasText: #text ;
    hasBoolean: #isAnswer

When saving a Question, ActiveItem will create one SimpleDB item with both the attributes of the Question and the attributes of each Choice:

code -> '010-0001'
text -> 'What language is best for writing Web applications?'
choices.1.text -> 'Java'
choices.1.isAnswer -> 'false'
choices.2.text -> 'Ruby'
choices.2.isAnswer -> 'false'
choices.3.text -> 'PHP'
choices.3.isAnswer -> 'false'
choices.4.text -> 'Smalltalk'
choices.4.isAnswer -> 'true'

By supporting composition, class Choice can be a normal ActiveItem subclass with its own attribute description. However, as you can see in the example, Choice objects do not need an id (actually the collection index is the id).

Because of limitations to the number of attributes per item (currently 256), this composition solution is not suitable for arbitrary large collections. If you expect this for your model then I suggest you use the normal hasMany: method that maps associations using “foreign-key” attributes.

Problems with Daylight saving time in VA Smalltalk

March 29, 2009

All the requests that Cloudfork-AWS sends to the Amazon web services contain the current date and time in Coordinated Universal Time (UTC). If this timestamp differs more than a few seconds from the current time you get an error. For example the S3 error: RequestTimeTooSkewed – The difference between the request time and the current time is too large. The reason for this time check is security, it prevents “record en playback” attacks.

So systems that make use of AWS must have the correct time and also the timezone must be correct. Otherwise the conversion to UTC will give the wrong result. A few days ago this all worked perfectly in VA Smalltalk, but tonight all AWS calls fail :-( Last night we in The Netherlands switched to Daylight saving time (DST). VA Smalltalk doesn’t seem to handle this very well. A call to “DateAndTime now” still returns an offset from UTC as one hour instead of two. It seems that this is a known problem.

Until this problem is fixed we have to use a less than elegant solution to get things working again. We have added a “DSTMode” flag, when this flag is true we subtract an extra hour when converting to UTC. You can enable this mode by executing:


CFPlatformServiceVASTUtils enableDSTMode: true

Support for the DSTMode was built into Cloudfork version jvds.79.

Getting started with SimpleDB using Cloudfork

March 29, 2009

The Cloudfork-AWS project has classes to use the Amazon Web Services Simple Database (SimpleDB) directly from Smalltalk. Using these classes, you can create domains, put items, and query items even using regular Smalltalk blocks. Cloudfork contains the CFSimpleDB class that makes the generic calls available as Smalltalk methods. Calls that are related to a domain are implemented in the CFSimpleDBDomain class.

SimpleDB
In short, SimpleDB items are stored in a domain which has a name. Each item has a name and a collection of attributes. Each attribute has a name and one or more values. Values can be String only ; the application must take care of conversion. SimpleDB provides a database in the cloud that supports large volumes of data that can be accessed anywhere on the Internet. Amazon Web Services (AWS) takes care of high availability, consistency, indexing and performance.

Before throwing away your current persistency, it is important to realize that SimpleDB is not a relational database. It does not provide relational consistency (constraints), does not have a schema and “records” contain String values only. Queries on items require expressions that have limited operators; no joins, no subselects (see documentation).

Use case
Besides being a simple object database, the service can be used to store reference data (e.g. zipcode tables, gps locations, currencies) or logging information (audits). Another example is storing social user profiles which typically have a variable set of properties. SimpleDB can also be used for storing metadata and references to S3 objects such as images,video and documents. Because a S3 object can contain any data, it can be used to store large attribute values that do not fit into a SimpleDB item.

Smalltalk
After subscribing to the SimpleDB services with your AWS account, you must create the credentials object:

awsCredentials := CFAWSCredentials  
  newWith: '<your access key>'  
  andSecret: '<your secret access key>'.

Create a SimpleDB domain
To store items we need a domain, let us create one. Every api call returns a CFAWSResponse which must be checked for errors before using its result.

sdb := CFSimpleDB newWith: awsCredentials.
response := sdb createDomain: 'zipcodes'.
response isError 
  ifFalse:[domain := response result]

Add items
The variable “domain” will be an instance of CFSimpleDBDomain that has various methods to access its items. For convenience, class CFSimpleDBItem can be used to encapsulate the name of the item and its attributes (name,value pairs).

item := CFSimpleDBItem newNamed: '3768GX'.
item valueAt: 'city' put: 'Soest'.
item valueAt: 'country' put: 'Netherlands'.
domain putItem: item.

Query items using expressions
AWS SimpleDB offers two sets of api calls that support criteria-based retrieval. You can use the query syntax:

domain query: '[''country'' = ''Netherlands'']'.

Or the select expressions:

sdb select: 'select city from zipcodes'.

More details on quering using “select” and “query” can be found at the AWS SimpleDB documentation.

Query items using Block
Cloudfork has classes that support the use of normal Smalltalk blocks to define the select condition. See the documentation for all possible operators and functions and class CFSSWOperand for the Smalltalk counterpart.

domain selectAllWhere: [:each | each country = 'Netherlands'].

Delete a SimpleDB domain
Deleting a domain will also delete all its items. No warning here.

sdb deleteDomain: 'zipcodes'.

This post showed you the basic usage of the Cloudfork SimpleDB services. Browse the classes in this package to see how other SimpleDB API services are mapped to Smalltalk messages. Also have a look at the CFSimpleDBEmulator which can used for Unit testing classes that use Cloudfork-AWS SimpleDB.

If you are planning to use SimpleDB as an object database, then have a look at Cloudfork-ActiveItem. It is a framework that can help in mapping your objects to SimpleDB items and takes care of String conversions, data sharding and can handle associations similar to Rails ActiveRecord.

VA Smalltalk version of Cloudfork is ready for use

March 27, 2009

All the functionality of Cloudfork-AWS is now also available for VA Smalltalk. With Cloudfork-AWS you can access the Amazon S3, SQS and SimpleDB services from a simple to use Smalltalk interface. The code is hosted at VAStGoodies.com, the SourceForge for VA related projects.

All tests are green!

All tests are green!

As you can see all tests pass. Porting from one Smalltalk dialect to another is a tedious job, there are a lot of little differences you have to take care of. For example the asSortedCollection is case insensitive in VA Smalltalk and is case sensitive in Squeak/Pharo. Because of this the AWS signatures were calculated wrong in VA. Also the functionality for parsing xml and using http are completely different. We have isolated all this dialect specific stuff in a separate package/application.

For installation instructions and for reporting issues you can use our project page on Google code: http://code.google.com/p/cloudfork/wiki/InstallingForVASmalltalk

Cloudfork SimpleDB now supports Batch Put

March 26, 2009

Yesterday, Amazon AWS announced the availability of the feature known as “Batch Put” for its SimpleDB web services. This operation allows you to do a faster put of multiple items (max 25) using a single Http request in a transactional way: either all inserts and updates succeed or nothing gets processed. Read the documentation for details and the warning about URL limits.

Using the Cloudfork-AWS, the operation can be used like this:

simpleDB := CFSimpleDB newWith: awsCredentials.
"create new or open existing domain"
domain := (simpleDB createDomain: 'cloudfork-batch-put') result.  "normally check for errors first"

"create some items"
item1 := CFSimpleDBItem newNamed: 'Jack'.
item1 valueAt: 'gender' put: 'male'.
item2 := CFSimpleDBItem newNamed: 'Jill'.
item2 valueAt: 'gender' put: 'female'.

"store them all at once"
domain batchPutItems: (Array with: item1 with: item2). "normally check for errors afterwards"

After adding the “batchPutItems:” method, one helper class and a few tests, I finished this feature within 2 hours thanks to the great Smalltalk language and a powerful IDE. So after a day, Cloudfork already supports the new API.

This feature is now available for Squeak/Pharo at Cloudfork. You can expect updates of the VA Smalltalk port (VAStGoodies.com) and VisualWorks port (Public Cincom Store) once we finished the export/imports.

Getting started with Cloudfork ActiveItem

January 16, 2009

ActiveItem is a lightweight persistency framework that uses Amazon SimpleDB for storage in the cloud. It is designed using the ActiveRecord pattern and influenced by the Rails framework. Read our previous post for an introduction and rationale. This post will take you through the steps in creating a small basic example that uses the ActiveItem framework.

Install

First of all, you need to startup your Smalltalk image. Goto the Cloudfork project page to read about installing instructions for your Smalltalk.

Setup
ActiveItem provides an abstraction layer on top of SimpleDB. In order to work with ActiveItem objects, it must have information about your AWS account. This is done by activating the framework using your AWS credentials.

| awsCredentials |
awsCredentials := CFAWSCredentials
		newWith: '<your access key>'
		andSecret: '<your secret access key>'.
CFActiveItem activateWith: awsCredentials.

Class definition
The abstract class CFActiveItem must be the root class of all your ActiveItem objects. At this point, there is no need to specify instance variables.

CFActiveItem subclass: #ShoppingCart
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'MyProject'

Describe the Model
Because SimpleDB does not have the notion of a relational schema for items in its domains, the structure of items must be defined by the class of each storable object. To automatically map a ShoppingCart to a SimpleDB item, ActiveItem needs to know the attributes of a ShoppingCart and its relation to ShoppingCartItem. This description is defined using the class method describe: .

ShoppingCart class>>describe: aShoppingCart
    " self rebuild "

    aShoppingCart
       hasTimestamp: #createdAt ;
       hasString: #customerName ;
       hasMany: #shoppingCartItems.

ShoppingCartItem class>>describe: aShoppingCartItem
    " self rebuild "

    aShoppingCartItem
       hasString: #productName ;
       hasInteger: #quantity ;
       belongsTo: #shoppingCart.

Generate Accessors
Inside the comment of the describe: method you see the expression “self rebuild”. Evaluating this will run the description with an ActiveItemClassBuilder. This helper will add the required instance variables and generate accessors for both the attributes and associations in your ActiveItem class.

Store a new ShoppingCart
Before we can store ShoppingCart objects, we have to create a SimpleDB domain.

ShoppingCart ensureDomainExists.
ShoppingCartItem ensureDomainExists

On default, the domain created will be Cloudfork.ActiveItem.ShoppingCart. You can change this by setting a different DomainShardingStrategy. This will be a topic of a future post. For now, we use this domain.

After creating the ShoppingCart and setting its variables, you simple send save to each object. This message returns with a Boolean indicating a successful operation.

| cart item |
cart := ShoppingCart new.
cart createdAt: TimeStamp now.
cart customerName: 'Dennis'.
cart save.

item := ShoppingCartItem new.
item productName: 'Orange Teapot'.
item quantity: 2.
item shoppingCart: cart.
item save.

Retrieving the ShoppingCart
CFActiveItem provides several class methods to retrieve objects stored in SimpleDB. Simple Smalltalk blocks can be used to specify the selection criteria in finding objects. Try inspecting this code fragment.

| cart |
cart := ShoppingCart findFirst: [ :each |
    each customerName = 'Dennis' ].
cart shoppingCartItems.

This post showed the basic steps in using ActiveItem to store objects in Amazon SimpleDB. Each Class requires an implementation of the describe: method as this is the only way ActiveItem knows how to store and retrieve your objects. Future blog posts will discuss the ActiveItem API in more detail. Topics will include Sharding Strategy, Expressions, PagingList access to query results and Attribute Specifications.

Code examples can be found in the packages Cloudfork-ActiveItem-Examples and in Cloudfork-Tests-ActiveItem.

Cloudfork ActiveItem : ActiveRecord for Amazon SimpleDB

January 15, 2009

SimpleDB is an Amazon Web Service that provides a “database” in the Cloud. Basic concepts are domains, items and attributes. Domains are named containers of items which themselves have attributes. Attributes have a name and one or more values.

ActiveRecord is a Design Pattern described by Martin Flower. Several implementations exists for different programming languages. The Rails framework includes a very popular Ruby implementation. Basically, every record is represented by on ActiveRecord subclass instance. By using “convention over configuration”, this active record requires little or no mapping for reading from and writing objects to a relational database. It reflects on schema information available from the database ; every change is directly visible to the active record.

ActiveItem is a framework that uses the concepts of ActiveRecord and method signatures from its Rails implementation to provide a persistency layer on top of Amazon SimpleDB. Its aim is to provide simple mechanism for persisting objects using the same conventions. SimpleDB is a schema-less container of items, each of them having attributes that are Strings only. Therefore ActiveItem requires a per class description in which you specify what variables and types need to be stored.

For mapping typed instance variables to attributes, a simple converter interface is available. More advanced features are Sharding (strategy to decide which object goes in which domain), paging access to query results, Smalltalk block-to-select-where expression translation and more complex attribute specifications.

Until Cloudfork AWS is ported to other Smalltalk dialects, this framework is only available for Squeak and can be downloaded from SqueakSource.

The next blog post about ActiveItem will show the basics for getting started with the framework by showing some example code needed to store a ShoppingCart in Amazon SimpleDB:

| cart |
cart := ShoppingCart new.
cart created: TimeStamp now.
cart customerName: 'Lisa'.
cart save.

Cloudfork AWS packages and some rough edges

January 5, 2009

Update: Installation instructions are described and maintained on Cloudfork @ googlecode

If you want to get started with Cloudfork AWS you need to know what packages to load and what the prerequisites are. The current implementation also has some rough edges that you need to be aware of. As mentioned on SqueakSource the prerequisite packages are:

  • Cryptopraphy Team Package
  • HTTPClient version

Both packages can be installed using the SqueakMap Package Loader. The Cryptography package is needed for the HMAC-SHA1 or HMAC-SHA256 signature functions. Amazon won’t accept your requests without them. The HTTPClient is needed for handling the HTTP REST calls. Both prerequisites can be loaded using the Universe Browser.

Packages

For Cloudfork AWS you need to load three packages using Monticello:

Cloudfork-AWS: The main package, this package contains most of the code. Only the Smalltalk dialect specific functionality has been factored out.

Cloudfork-Squeak-Platform: Package with functionality that we think is Squeak specific. This is mostly done by subclassing base classes from Cloudfork-AWS.

Cloudfork-Tests-AWS: This optional package contains unit and integration tests. They can be used to verify that everything works correctly and especially the integration tests give a good example on how the API’s can be used.

We have used the Seaside 2.9 project as a good example on how to setup the package structure. Also note that before you can actually run the integration tests you need an AWS account that is enabled for the specific service you want to test (SimpleDB, SQS or S3). You can set your AWS account credentials using the class method defaultAwsKey: awsKey andSecret: aSecret of the class CFAWSIntegrationTest.

Rough edges

Although the package are already usable there are still some rough edges you need to be aware of:

  • Proxy support – If you want to use Cloudfork AWS from behind a proxy you need to define this somewhere. We haven’t yet made a public method to configure this. For now you will need to change the initialize method of CFAWSRESTSqueakAccess. The comment contains a sample proxy definition.
  • HTTPS – All the AWS services with a REST based interface can be used with plain HTTP and with secure HTTP (HTTPS). Until now we have only tested with plain HTTP.
  • UTF-8 – Alls HTTP calls to AWS should be encoded using UTF-8 Currently this works for the standard characters . We have not done any testing with special / non-western characters.

Follow

Get every new post delivered to your Inbox.