From idea to a product

May 17, 2012

Selling the idea

We had the vision, and we sold our vision in the way a lean startup would do: sell the vision, receive funding for the project, deliver the product. Turns out it is extremely hard to sell something that does not exist, but we did it. After a while we found our first customer, which believed in us and was the right kind of customer. We worked for about a month alongside them, formalized the vision into a product, writing schemas, functional specifications, project plans, time estimates, wireframes. I think they were quite supportive and not too pretentious. They also saw a lot of value in what we were delivering. We also won a TSB grant, which helped us to go through initial stages and eventually hire our second developer.

Studying the idea

How to realize what we had in mind went through many iterations. Nobody of us had the complete knowledge to realize what we sold, the company did not have the expertise in house and the possibility to afford anyone who had it, so we had to learn a lot. I am not talking about frameworks and libraries but algorithms.
Luckily enough, all the stanford classes of the past year introduced us to the techniques that we ended up using… machine learning and nlp most of all.
Our original idea was to be able to estimate how much a company is making in terms of revenue, estimating it with an acceptable margin of error, starting from data available publicly. There is a lot of data available publicly, some from Companies house, some from other financial institutes, some from the social sphere, some from search engines. Some information is also given away by the companies themselves on their site (e.g. testimonials, press releases, etc…). Some other from press releases of investors such as VCs. If you dig deep enough you will see there is a LOT of data, but the problem is that there is always noise, would this noise have compromised our efforts?

Machine learning

Turns out ML is a very vast field, much of it unexplored. Neural networks are not the only method to instruct a machine to take decisions like humans… there are many others. Human brain can fine tune automatically, but in a machine you have to pick the right algorithm: SVM, kNN, Naive bayes, NN, linear-logistic regression..
Once you know these techniques, actually what we do is pretty simple, take known connections between data and revenue, train the system and use it to predict revenue when we do not have it. In the real-world though, there is a lot of work to do on data.. smoothing, make sure it is time-overlapping, make sure features you chose represent well the reality, etc.

Choosing the technology

Python was the perfect choice for me. The one i have most experience with, and a very good language for prototyping ideas. I had previous experience in a startup using Scala and I was not able to reach the same development speed. I think slow development is absolutely to be avoided in a startup, it is a motivation killer… Generally in a startup you should use things you know well, it cuts development time. Forget all the cool techs you always wanted to try, try them in a personal project, not a serious startup. I know it is a blow, but you have to accept it, it is a matter of life or death. We used Mysql, not Cassandra. I knew Mysql very well already and, to be honest, how many Mysql experts can you find on the market? and how many Cassandra experts?

Building the team

When i joined, it was just me and the CEO. It is a pretty exciting time to join a company, i had to give a direction to everything in terms of software development. We needed a team, but i was not sure exactly what kind of person.. senior, junior, contractor? Our budget was limited, therefore the best choice turned out to be someone junior-mid level with potential to grow quickly in the organization. Which skills did we require in a developer? Initialy i thought we need someone that knew Python but having thought about it.. what we really needed is someone interested in what we do and in the algorithms we use. Python is such an easy language you can pick it up in a couple of weeks, and if you don’t, you are probably not the right kind of developer for us anyway.

Stay lean

All we needed is a shared folder for docs, a code repository, a virtual machine and a whiteboard with lots of post-it notes. No need for issue tracking or CI server when you start, but that is not an excuse to get sloppy with unit-tests. I like 5-10 minutes stand-ups in the morning, you get a sense of what the team is up to and you can offer solutions to problems pretty quickly. There is not really time for documenting code either, at the start it is all in the heads of people. I am a fan of writing code like a prose, that is the best way to write docs.

Lesson learnt

- You are in a startup usually because you are incentivized by the idea, team or equity in the company, not because you have a good salary. Make sure you understand this.
- Team is the most important thing. Much better if you have the same level of experience but specialize in different things.
- Use what you already know, learn quickly what you don’t.
- Identify a market and clients before starting. Call them and sell them the product, before, during and after you build it.
- You have to tackle issues, not wait for somebody else.
- Write down the rules of the game before joining.

Launch!

We delivered our first version of the product a couple of weeks ago… we have a separate production VM for that :-)

The service is private at the moment, but a public version of it will be available in the coming months.

Ubuntu on EC2, the simple way.

May 6, 2012

Amazon EC2 is a virtual machine hosting service, also known as IaaS. Quite similar to Linode or Rackspace. Payment here is per hour, differently from Linode… slightly on the expensive side i might add, but VMs are quite powerful.

First step is to go through the setup procedure in order to have ec2 tools setup on your machine. I run ubuntu on my laptop and i applied the steps described here.

After having installed the api tools and having put all EC2 environment variables in your .bashrc file, type:

ec2-describe-images -o amazon

You should see the list of public AMIs from amazon. If you don’t there are problems with your configuration.

By default the firewall blocks every access to every port, you have to explicitly enable access in the security group that is associated to your machine (or in the default security group).

ec2-authorize default -p 22

This enables the ssh port. Next thing is to create the machine, i used ubuntu 11.10 64bit EBS-backed. It’s ami code is ami-895069fd. It is possible to bootstrap this specific image with a bootstrap script:

ec2-run-instances ami-895069fd -t m1.large --user-data-file ~/ec2/bootstrap.sh

This is an example bootstrap file:

#!/bin/bash

set -e -x
export DEBIAN_FRONTEND=noninteractive
apt-get update && apt-get upgrade -y

apt-get install -y xorg
apt-get install -y fluxbox
apt-get install -y vnc4server

wget --user="YOURUSER" --password="YOURPASS" -O /tmp/vnc-conf.tgz https://server/vnc-bootstrap.tgz
cd /home/ubuntu && tar xfvz /tmp/vnc-conf.tgz && chmod -R 700 .vnc

chmod 755 /etc/X11/xinit/xinitrc
su -c vnc4server ubuntu

In this script, i install all the packages i need and i download some initial data. For anything more serious than this i advise you to look into Puppet or Chef.

Text classification in Python

March 21, 2012

Python and NLTK form quite a good platform to do text analysis. There is a lot of information on Internet, nevertheless i have not found a clean and simple example of a classifier. Text classifiers come from techniques such as Natural Language Processing and Machine Learning, in fact i think they are exactly in the middle of these.

Bearing in mind that building a good classifier is only possible when you have a training set that represents reality quite well, and certainly longer than the one in this example, here a first stab at it:

import nltk
import itertools
import sys
import random

class Classifier(object):
    """classify by looking at a site"""
    def __init__(self, training_set):
        self.training_set = training_set
        self.stopwords = nltk.corpus.stopwords.words("english")
        self.stemmer = nltk.PorterStemmer()
        self.minlength = 7
        self.maxlength = 25

    def text_process_entry(self, example):
        site_text = nltk.clean_html(example[0]).lower()
        original_tokens = itertools.chain.from_iterable(nltk.word_tokenize(w) for w in nltk.sent_tokenize(site_text))
        tokens = original_tokens #+ [' '.join(w) for w in nltk.util.ngrams(original_tokens, 2)]
        tokens = [w for w in tokens if not w in self.stopwords]
        tokens = [w for w in tokens if self.minlength < len(w) < self.maxlength]
        #tokens = [self.stemmer.stem(w) for w in tokens]
        return (tokens, example[1])

    def text_process_all(self, exampleset):
        processed_training_set = [self.text_process_entry(i) for i in self.training_set]
        processed_training_set = filter(lambda x: len(x[0]) > 0, processed_training_set) # remove empty crawls
        processed_texts = [i[0] for i in processed_training_set]
        
        all_words = nltk.FreqDist(itertools.chain.from_iterable(processed_texts))
        features_to_test = all_words.keys()[:5000]
        self.features_to_test = features_to_test

        featuresets = [(self.document_features(d), c) for (d,c) in processed_training_set]
        return featuresets

    def document_features(self, document):
        #document_words = set(document)
        features = {}
        for word in self.features_to_test:
            #features['contains(%s)' % word] = (word in document_words)
            features['contains(%s)' % word] = (word in document)
            #features['occurrencies(%s)' % word] = document.count(word) 
            #features['atleast3(%s)' % word] = document.count(word) > 3
        return features

    def build_classifier(self, featuresets):
        random.shuffle(featuresets)
        cut_point = len(featuresets) / 5
        train_set, test_set = featuresets[cut_point:], featuresets[:cut_point]
        classifier = nltk.NaiveBayesClassifier.train(train_set)
        return (classifier, test_set)

    def run(self):
        featuresets = self.text_process_all(self.training_set)
        classifier, test_set = self.build_classifier(featuresets)
        self.classifier = classifier
        self.test_classifier(classifier, test_set)

    def classify(self, text):
        return self.classifier.classify(self.document_features(text))

    def test_classifier(self, classifier, test_set):
        print nltk.classify.accuracy(classifier, test_set)
        classifier.show_most_informative_features(45)

classes = ('a la carte', 'advertising', 'commission', 'investment', 'pay as you go')

training_set = [
    ('we are a bank specialized in dealing with IT companies', classes[3]),
    ('we sell our product at a fixed cost of 10 pounds', classes[0]),
    ('the cost per click is 0.01 dollars but if you get more than 10000 impression the cost will be 0.12', classes[1]),
    ('we take a 1% commission on all sales, overseas sales have an additional charge of 12%', classes[2]),
    ('we charge a 1% on top of your final price.', classes[2]),
    ('we sell our product at 5 pounds, excluding with the variant A which costs an extra of 55 pounds', classes[0]),
    ('we sell our product at 6 pounds, excluding with the variant B which costs 45 pounds', classes[0]),
    ('our commission is normally between 1% and 2%', classes[2]),
    ('impressions on the homepage on sundays are worth 0.01 pounds', classes[1]),
    ('we will show impressions only to users that correspond to certain criteria.', classes[1]),
    ('we manage an hedge fund and we take care of placing investments on behalf of our clients', classes[3]),
    ('we bill only for the amount of api you use. 0.10 per 1000 calls', classes[4]),
    ('running a virtual machine will cost you 0.12 pounds per hour', classes[4]),
    ('we invest in major hedge funds', classes[3]),
    ('we are an international bank, based in all countries of europe', classes[3]),
]

test_text = "we are a hedge fund collaborating with many banks in europe"
test_text2 = "we charge a fixed fee on top of our client's sales"

if __name__ == '__main__':
    classifier = Classifier(training_set)
    classifier.run()
    print "%s -> classified as: %s" % (test_text, classifier.classify(test_text))
    print "%s -> classified as: %s" % (test_text2, classifier.classify(test_text2))

You can run this code and classify entities based on their preferred sales target. Some of the above lines are commented, uncomment them if you think it gives you a better representation of the example. Just add a further 1000 good examples and then it should start to make accurate decisions… enjoy!

Why SOA in a startup sucks

January 22, 2012

Service Oriented Architectures as i intend are architectures where each component is deployable and usable separately from others. Take as example Amazon, much of their software stack is offered as a service, mainly through an HTTP interface, and marketed as SaaS.

There are tons of reasons why you would want to think to your system as a series of components deployable separately… flexibility, being able to choose the right tools for each component, etc.. i will not talk about the advantages.

Sometime their use is convenient, but there’s a lot of hype about it and i want to write what i learned (the hard way) about building your system with this architecture in mind.

Development cycle is too slow

When you need to change/add a functionality that changes more than one component, you need to code the functionality in both components and redesign the interaction. If you are using REST it means you will end up modifying URLs and data they return. To sum up, time to add code in component A, time to add code in component B, time to redesign and code changes in the interaction, time to redeploy them… it is quite different from “everything in one component” scenario, the only thing you do is coding.

Poor testability

Quite difficult to test functionalities that span multiple services, especially when persistency is involved. How do you limit components side-effects? Perhaps a testing flag passed through an HTTP request can be enough, but that can mean quite a few internal changes which you could skip if everything was integrated.

Swapping mindsets (and related frustration)

This is not a technical reason but practical. When juggling between two different software, you have to change mindset and you have to get up to speed again. This mental effort will cost you time that you could have spent otherwise.

 

Passion and Pragmatism in IT

November 3, 2011

Do what you love is what, for instance, Steve Jobs said. It does not get more true than that. Hacking has been labeled as a negative word, but for me has a positive meaning, which is the impulse of people to open things, learn how they work and change it to make it better. Hacking for me is a passion, something I always do with pleasure.

Having this attitude in IT is important, only experimenting with stuff pushes your skills forward. Courses are not what gives you the ability to do things, they are just a kickstart for your hacking. Learning is one of the things i believe i am good at, it is something i find myself doing without realizing it.

But to work in IT, hacking by itself is not a great quality if you are not able to control where it is going. Attainment is important, because it is the manifestation of your knowledge. If you are not able to make things concrete, how can someone know if you are good at what you do?

Pragmatism is when you learn what in practice is needed and you do it. In practice, a product that comes from only passion is a nice thing to have but does not directly satisfy a need. In constrast, if you direct your hacking towards satistying a need, then it becomes a killer feature.

Needs are always specified in time. Today i know i will need to eat, if i skip today and eat tomorrow i will survive but i will be pretty upset. People work to satisfy needs, not because they are passionate about it (although i hope they like it), so if they ask you something for work, your passion must be directed towards their need. That’s the only way you will get money and success in the long-term. Money satisfies your needs, success is the intellectual reward for your passion. Both important.

Unfortunately work is not always fertile ground for exploration, at least not forever. In the end, if hacking for you is important, switch job to keep things moving. I know it can be daunting for someone, but sometimes you just have to take a leap of faith.

Logging

April 8, 2011

I work on e-commerce platforms, logging is a critical component in this area. Learn how to do it correctly will allow you and everybody else to save time when problems happen. Sometimes it is not just about saving time, but also being able to give correct answers to customers when things go wrong. When interacting with third parties, logging is even more important, because it allows you to understand where the problem lies, in your code or in the external service.

First basic rule is use a logging framework. There are plenty out there, for any language. Modern logging framework are quite complex, they allow you to use different methods of logging (file, database, etc..), log rotation, log levels, email triggering, etc… you should refrain from inventing your own, usually these are pretty extensible anyway. All of the frameworks i have seen have usually the same sort of interface, level based… and they are quite easy to use.

Once you set up your framework, there is a lot to say about how you log things. Do not pretend that people already know your code inside out when they read your application logs. In fact, high are the chances that you do not remember how you coded the functionality if you are looking at logs of old code. Do not give implementation insights in the log files, talk about what the code is doing using the language of the application domain. If you are building a billing application, use billing specific vocabularies. If it is a payment gateway, try to reuse the 3rd party terminology. If you need to, you can quickly lookup the term you used in their documentation and discover what that means. That is something you cannot do if you made that up.

After having chosen how to log things, consider every log call and give it a degree of severity. From an ignorant point of view, you know you should start worrying when you see lots of “Critical” inside the log files, even if you do not know the application domain. If you see one, it is probably a good idea to take a look at the code or at least notify the author.

For critical errors or errors that need to be fixed, useful is to include stats (cpu, memory usage, etc) and location of the code that was being executed. If you had an exception triggered, log the exception stack trace.

How small websites become big

March 30, 2011

There is no secret recipe, there is no list of check boxes to tick… just some guidelines. Part of these lessons have been learned in the hard way, part because i have been always taught that if you want to be the best, you have to copy the best. There is plenty of literature on Internet about this… read, understand and copy.

I think the art of building high-traffic websites is part about the code, but mostly about your web architecture and the tools you use. Here some points, from basic to advanced.

  • Separation between web server and database server is a basic step. Do it if not for speed, for safety. If the database server gets overloaded, your web server will still be up and running. If one of the two breaks, you need half the time to go back online. If the website goes much slower, you over-rely on the database. You may have to rethink how you use databases.
  • Databases often are the main bottleneck of your website. Everything that is I/O related is a bottleneck because no matter what server you have, disks will always be a order of magnitude slower than memory. Consider running these servers on physical machines rather than virtual servers, with properly fast hardware.
  • Optimize your queries, use indexes on fields that you search on frequently. This can make a big difference. Databases are weird creatures, you need to know them well before feeling safe.
  • Caches are both a blessing and a curse. Using systems like Memcache (or Redis) really makes a difference. Install memcache on every webserver machine and cache all the SELECTs that can be re-used in the next X minutes. When the cache is empty, execute the query on the database and put the results in the cache for later retrieval.
  • Optimization makes much sense in certain areas of code. Use profiling tools to see which functions/classes get executed more often and modify that code to make it fast.
  • Do not blindly believe ORM is always a good solution. In fact, for heavy db tasks, do not use them.
  • Move all your static files on a static web server and serve them from there instead of the main web server. You will split the load without having to do any complex configuration change, other than changing base href in the html. If you have many many files, you may want to tweak the filesystem for it.
  • For static files, use a lightweight asynchronous web server like Nginx. Especially if you send emails with lots of images… people tend to open emails as soon as they get to work or during lunch time therefore you will get very high peaks of traffic during those hours. Asynchronous web servers handle traffic spikes much better than traditional web servers.
  • Start adding web servers. If you use sessions, you need to store those in a space shared between all web servers, which could be database or shared drive. Shared drive is generally a good idea, put your application on it.. when you upgrade you need to do it only in one location.
  • Start thinking about reverse proxy, load balancers and HTTPS accelerators. Here presented in order of cost.. Reverse proxy solve the so-called “spoon feeding” problem quite well, plus you can serve cached responses if configured properly. Nginx is my favourite, followed by Varnish for complex caching policies.
  • Database servers are not “full text” search servers. Search is an expensive operation, must be done on dedicated systems, especially if website users do it frequently.
  • If you have much off-line data processing to do, do it on a dedicate server. You may want to look into Hadoop if volumes of data are enormous.
  • The more code and servers you have, the more likely is that something wrong happens. Learn to log events properly, with all the information you may need. In areas in which performance is really important, you may want to consider conditional logging. It is always better to have some logging than to have extremely fast code which is not debuggable when it fails.
  • Automatize! Having a script for everything is important. Deployment is one of the first things to automatize, especially when more than one server is involved.

I think i mentioned a lot of things. There are many more to mention but it is more about the management of code and project itself. Maybe in another post.

Functional Programming ideas in OOP

March 27, 2011

About a year and half ago I started to be interested in Scala. Scala is a hybrid between an Object Oriented language and a Functional Language, and while i was using it i learnt to appreciate more and more the Functional part. I will not hide that the most difficult part in learning Scala was because of that.

The mindset when solving problems using FP is different because it forces you to think in terms of mapping transformations rather than step-by-step algorithms. Type systems are also very strong, more than the OOP i know.

Without getting the rant go too far, I found that my OOP style is now really influenced by the functional thinking:

  • lots of small functions, generally short, with strict behaviour.
  • functional style is usually more testable because by definition there is no side effect in the code. The code does only one thing and function application to a state A always returns state B. This links to Referential Transparency.
  • state is partly responsible for exponential increase of complexity when stacking up code. Inheritance, composition, whatever technique you use. An object variable that changes state inside a nested object is usually quite difficult to follow. I am not going to say all variables need to be read-only, but limiting the scope in which variables are written and overwritten is good.
  • if you can choose between stateless and stateful implementations and you are working on the business domain but still do not know it well, go stateless. Stateless implementations are easier to change.
  • type systems initially are a pain, but they enforce you to write safe code and ultimately produce better code. Generally i found that there are not many cases in which you want automatic casting to happen. It also clashes with the rule “fail fast”, which is really high in my priorities. Type inference at pre-run time is generally what you want, not changing types.
  • i found FP use of types leads to more specific code. For instance, I would rather use a “Currency” type than a float type. Being specific is good, less space for doubt. This is not an unquestionable rule, if performance is crucial and compiler does not optimize this code, that is a big mistake..

I am sure there is a lot more about FP, but this is it for now. These are personal opinions, i am not a language theorist. I am interested in practical consequences, and this is what is happening to my way of working.

CDN Optimizations

October 17, 2010

One of our most trafficked website is on average sustaining 300000 page views per day. Each page has normally a considerable amount of JavaScript, some of it activated only after the whole DOM has been loaded.

Considering that every page has on average 20-30 images coming from our image server, every small optimization to it has an avalanche effect on all the other parts of the system.

I already described the infrastructure in the Nginx post. What i changed from that configuration is all I/O related, trying to minimize writes on the disk and internal connections to Apache, but the biggest change is the duration of nginx cache which is now 6 hours instead of one hour. The impact on the main site has been remarkable.

Dbunit testing

October 17, 2010

This article is about PHPUnit used in conjunction with DBUnit to test PHP code that interacts with a database server.

Please note that DBUnit is able to load and unload sets of data to the db but does not handle table creation and queries that alter structures. This is responsibility of an ORM or an initial sql script that creates/rebases the initial environment.

Code i wrote is run against a very simple ORM implementation we currently use in my workplace. Code should be simple enough to follow.

DB Unit test

A dbunit test is basically a unit-test which inherits from PHPUnit_Extensions_Database_TestCase and declares two more methods: getConnection() and getDataSet(). The first one must return the dbunit wrapper of PDO and the second a dataset representation created with create*Dataset() functions.

class TestAddress extends PHPUnit_Extensions_Database_TestCase
{
    private $fixture_addressId;
    private $fixture_location;

    public function setUp()
    {
        parent::setUp();
        $this->fixture_addressId = "5";
        $this->fixture_location = "Melbourne";
    }

    protected function getConnection()
	{
		$pdo = getPdo(); // replace this code
		return $this->createDefaultDBConnection($pdo, 'testdb');
	}

    protected function getDataSet()
    {
        return $this->createFlatXMLDataSet(dirname(__FILE__).'/../fixtures/db-addresses.xml');
    }

    public function testBasicFixtureLoading()
    {
        $mapper = new address_Mapper();
        $address = $mapper->findById($this->fixture_addressId);
        
        $this->assertEquals($address->addressLine3, $this->fixture_location);
    }

    public function testSave()
    {
        $mapper = new address_Mapper();
        $address = $mapper->findById($this->fixture_addressId);
        $address->addressLine3 = "London";
    	$mapper->save($address);
    	unset($address);
        $address = $mapper->findById($this->fixture_addressId);
    	$this->assertEquals("London", $address->addressLine3);
    }
}

model object

Model objects are a representation of data. They are basically a data container. They must not contain any integration logic (ex. database queries), that is responsibility of mapper objects.

class Address extends Model
{
	public function __construct()
	{
		parent::__construct('addressId');
		$this->setFieldNames(array(
			'addressId',
			'firstName',
			'lastName',
			'addressLine1',
			'addressLine2',
			'addressLine3',
			'addressLine4',
			'state',
			'postCode',
			'country',
			'createdDate'
			));
	}
}

mapper object

Responsibility of mapper object is to populate and return correspondent models of data. In this case it returns address models. Our mapper class already offers generic find()/insert()/update()/delete() operation but you may want to extend it to use different find methods.

class address_Mapper extends Mapper
{
	const STORAGE_NAME = 'Addresses'; // table name

	public function __construct()
	{
		parent::__construct(getPdo(), self::STORAGE_NAME, 'addressId');
	}

    public function findById($addressId)
    {
        $identity = array('addressId' => $addressId);
        $model = new Address;
        parent::find($identity, $model);
        return $model;
    }
}

Fixture datasets

These are our test data. Each dbunit test has his own dataset. There are many formats available for PHPUnit, the one here is called FlatXMLDataset, which is really simple.

<?xml version="1.0" encoding="UTF-8" ?>
<dataset>
 <Addresses
   addressId="5"
   firstName="myName"
   lastName="mySurname"
   addressLine1="myAddr1"
   addressLine2="myAddr2"
   addressLine3="myAddr3"
   addressLine4="Melbourne"
   state=""
   country="AU"
   postCode="3400"
   createdDate="18:16:19 2009-10-18"
  />

 <Addresses
   addressId="6"
   firstName="Another"
   lastName="Person"
   addressLine1="Somewhere"
   addressLine2=""
   addressLine3="London"
   addressLine4=""
   state=""
   country="UK"
   postCode="XXX111"
   createdDate="18:00:19 2009-10-18"
  />
</dataset>

Each child tag of <dataset> is <TABLE_NAME column1=”value”> kind of syntax. Nothing more than that, no structure only data.

Bootstrapping

As we said there is no table definition loading by default in DBunit. Initial environment must be setup before the dbunit test runs and this can be done using PHPUnit bootstrap files (see –bootstrap option).

It is not necessary to drop and recreate tables everytime but highly advisable, there could be cases in which the test is supposed to fail but it does not because it reads data that should not be in the database. That is why we need to control execution environment as much as possible.

define("FIXTURE_DB_REBASE", dirname(__FILE__)."/fixtures/db-rebase.sql");

// To test database-dependent classes you need a local database with the following settings
$host = 'localhost';
$user = 'unittests';
$password = 'myPassword';
$dbName = 'app_UnitTests';

// Create a database adapter
try {
    $dbh = new PDO("mysql://".$host."/".$dbName, $user, $password);
} catch (PDOException $e) {
    echo 'Connection failed: ' . $e->getMessage();
}

// Ensure database credentials work
try {
    $results = $dbh->query("SHOW TABLES")->fetchAll();
} catch (PDOException $e) {
    echo "You need to create a local test database - see bootstrap.php for more details\n";
    echo "Connection error: ".$e->getMessage()."\n";
    exit;
}

// rebase the database
$dbh->query(file_get_contents(FIXTURE_DB_REBASE))->closeCursor();

DB rebase SQL fixture

--
-- Table structure for table 'Addresses'
--

DROP TABLE IF EXISTS Addresses;
CREATE TABLE Addresses (
  addressId int(10) unsigned NOT NULL auto_increment,
  firstName varchar(128) NOT NULL,
  lastName varchar(128) NOT NULL,
  addressLine1 varchar(256) NOT NULL,
  addressLine2 varchar(256) NOT NULL,
  addressLine3 varchar(256) NOT NULL,
  addressLine4 varchar(256) NOT NULL,
  state varchar(128) NOT NULL,
  country varchar(128) NOT NULL,
  postCode varchar(32) NOT NULL,
  createdDate datetime NOT NULL,
  PRIMARY KEY  (addressId)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;


Follow

Get every new post delivered to your Inbox.