There are many chatbot services available in the market like AWS Lex, Dialogflow, Chatfuel. These chatbots comes with a lot of benefits like low cost, highly available and scalable, and the offer of free integrations with messager platform like facebook messenger, whatsapp, slack, SMS etc.
However these chatbot platforms don't really solve the issue of data privacy and segregation between countries. This is where those open source chatbot frameworks come into play an important role. Also with open source frameworks, you can have the following benefits
- Extremely Low Cost
- Extremely high customisability and flexibility
Overview

In this article, I will use serverless and python to build an chatbot service equivalent to what AWS Lex and others like Dialogflow offer by using Snips NLU. The chatbot I'm building will offer the following functionalities:
- API endpoint to train the chatbot
- API endpoint to make chatbot inferences, and returns the predicted intent
Training
First of all, let's create a file called src/trainer.py. The training process contains the following steps:
1.Get the training dataset from S3 bucket. Here is the sample training dataset
def load_model():
global bucket_name
model_file_path = "/tmp/raw_data.json"
s3 = boto3.resource('s3')
print("downloading training data to {}".format(model_file_path))
s3.meta.client.download_file(bucket_name, "bot/raw_data.json", model_file_path)
train_model(model_file_path)
The above function will use the download_file S3 API to download the training data from S3 bucket
2.Train the chatbot by using Snips NLU
def train_model(model_file_path):
global nlu_engine
print("reading model at {}".format(model_file_path))
with io.open(model_file_path) as f:
model = json.load(f)
nlu_engine = SnipsNLUEngine(config=CONFIG_EN)
print("training model")
nlu_engine.fit(model)
It will load the downloaded training data and use fit function to train the Snips NLU model
3.Upload model to S3 bucket
trained_model = nlu_engine.to_byte_array()
s3 = boto3.client('s3')
print("uploading training result")
s3.put_object(Bucket=bucket_name,
Key="bot/model.json",
Body=trained_model,
)
to_byte_array will export the trained model to binary data. s3.put_object will then upload the binary content to S3 bucket
Inference

Create a file called src/intent_processor.py. The inference process contains the following steps:
1.Get the latest version number of the trained model
def load_latest_model():
global latest_version
global bucket_name
if len(bucket_name) > 0:
client = boto3.client('s3')
response = client.head_object(Bucket=bucket_name, Key="bot/model.json")
print("model version: {}".format(latest_version))
current_version = response.get('VersionId', response.get("LastModified", "0"))
if latest_version != current_version:
latest_version = current_version
print("not latest version")
else:
raise Exception("Config bucket is undefined")
download_model(latest_version)
The head_object function is the AWS S3 API to get the meta of model file. The returned data contains either the VersionId or the LastModified date of the file.
2.Download the latest model from S3 bucket
def download_model(model_version):
global bucket_name
model_file = "{}.json".format(model_version)
model_file_path = "/tmp/models/{}".format(model_file)
if not os.path.isfile(model_file_path):
print("model file doesn't exist, downloading new model to {}".format(model_file_path))
s3 = boto3.resource('s3')
if not os.path.exists('/tmp/models'):
os.makedirs('/tmp/models')
s3.meta.client.download_file(bucket_name, "bot/model.json", model_file_path)
load_model(model_file_path)
The download_model function checks whether the latest model exist in the /tmp folder. If it doesn't exit, it will download the model file to the /tmp folder.
3.Load model to Snips NLU
def load_model(model_file_path):
global nlu_engine
print("reading model at {}".format(model_file_path))
with io.open(model_file_path, 'r+b') as f:
model = f.read()
nlu_engine = SnipsNLUEngine.from_byte_array(model)
It will open the binary file and load the binary file using SnipsNLUEngine's from_byte_array function
4.Intent Inference
body = json.loads(event.get('body', '{}'))
response = nlu_engine.parse(body.get('message', ''))
The body parameter exists in the event parameter passed into the lambda handler function and it is a stringified json object. The message parameter contains the content needs to be processed by the chatbot model. A sample output is as below
{
"input": "Will it be sunny in London",
"intent": {
"intentName": "sampleGetWeather",
"probability": 0.7527030377152906
},
"slots": [
{
"range": {
"start": 20,
"end": 26
},
"rawValue": "London",
"value": {
"kind": "Custom",
"value": "London"
},
"entity": "location",
"slotName": "weatherLocation"
}
]
}
Serverless

The serverless configuration contains training and inference functions, corresponding API Gateways and S3 Bucket which contains the training dataset and the trained model.
The Lambda IAM Role needs to contain the following permissions to be able to get, download and upload the S3 object
[
{
"Effect": "Allow",
"Action": [
"s3:ListObjects",
"s3:ListBucket"
],
"Resource": {
"Fn::GetAtt": [
"ModelBucket",
"Arn"
]
}
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:HeadObject",
"s3:PutObject*"
],
"Resource": [
{
"Fn::Join": [
"",
[
{
"Fn::GetAtt": [
"ModelBucket",
"Arn"
]
}
]
]
}
]
}
]
The serverless-python-requirements plugin will be used to package the python modules. The configuration is as below:
custom:
pythonRequirements:
dockerizePip: true
zip: true
'dockerisePip:true' will use docker to package all the modules. Because the size of Snips NLU dependencies exceeds the AWS Lambda 250MB limitation, 'zip:true' will compress all the dependencies and libraries and extract them to the /tmp folder which has a bigger limitation which is 500MB. Also the serverless-python-requirements plugin will package unzip_requiremenets.py into the zipped AWS Lambda file.

unzip_requirements.py
import os
import shutil
import sys
import zipfile
pkgdir = '/tmp/sls-py-req'
sys.path.append(pkgdir)
if not os.path.exists(pkgdir):
tempdir = '/tmp/_temp-sls-py-req'
if os.path.exists(tempdir):
shutil.rmtree(tempdir)
zip_requirements = os.path.join(
os.environ.get('LAMBDA_TASK_ROOT', os.getcwd()), '.requirements.zip')
zipfile.ZipFile(zip_requirements, 'r').extractall(tempdir)
os.rename(tempdir, pkgdir) # Atomic
In every other python scripts, the following script needs to be added to the top
try:
import unzip_requirements
except ImportError:
pass
It will guarantee all the zipped package will be extracted to the /tmp folder and then loaded before everything else during runtime.
In addition, the plugin uses the requirements.txt file by default to package the required python modules into the zipped file. The requirements.txt below contains pretty much everything required except the Snips NLU language pack.
deprecation==2.0.6
docopt==0.6.2
future==0.16.0
num2words==0.5.9
numpy==1.15.4
packaging==19.0
plac==0.9.6
pyaml==17.12.1
pyparsing==2.3.1
python-crfsuite==0.9.6
scikit-learn==0.19.2
scipy==1.2.1
semantic-version==2.6.0
sklearn-crfsuite==0.3.6
snips-nlu==0.19.4
snips-nlu-parsers==0.2.0
snips-nlu-utils==0.8.0
tabulate==0.8.3
tqdm==4.31.1
The language pack is included in the src/snips_nlu_en-0.2.1 and the load_languages.py in the project root folder will copy the language pack into the snips_nlu/data/en folder. Snips NLU will load the language by default from the en folder once you call the python function below
from snips_nlu import load_resources
load_resources(u"en")
Deployment

Deploy the serverless stack to the AWS environment by executing the following command
AWS_ACCESS_KEY_ID=<aws_key_id> AWS_SECRET_ACCESS_KEY=<aws_secret_key> deploy --stage <stage> --region ap-southeast-2
How to Use It
Need to upload the training dataset to 'bucket/bot/raw_data.json' before triggering the training process. Send GET request to https://apigateway/stage/train to trigger the training process.
Send POST request to https://apigateway/stage/processIntent with the payload like below to predict the intent
{ "message": "<message>"}
The full source code is in the following github repo https://github.com/alexyurepercept/snips-nlu-serverless