.vscode | ||
api | ||
chart | ||
models | ||
.dockerignore | ||
.drone.yml | ||
.gitignore | ||
.releaserc | ||
app.py | ||
Dockerfile | ||
poetry.lock | ||
pyproject.toml | ||
README.md | ||
values-devel.yaml | ||
values-stable.yaml | ||
values-staging.yaml |
Here I will copy de Carto challenge and answering all questions as they are appearing and providing details of how I did solved the problem.
Challenge
Based on the attached environment_airq_measurand.csv
file:
API
Create an API REST.
-
Choose your preferred coding language / framework, or the ones that fit better in this context.
-
environment_airq_measurand.csv
is composed of simulated data coming from air quality measurements that were used during the development phase of a well known project for managing Smart Cities. Dataset fields:Timestamp
: measures are taken every 15’.id_entity
: quality air station identifier.so2
: μg/m3 of SO2 (Sulfur dioxide).no2
: μg/m3 of NO2 (Nitrogen dioxide).co
: mg/m3 of CO (Carbon monoxide).o3
: μg/m3 of O3 (Ozone).pm10
: μg/m3 of PM10 (particulate matter with 10 μm or less in diameter).pm2_5
: μg/m3 of PM2,5 (particulate matter with 2,5 μm or less in diameter).
I have used Python becuase I have previous experience using Flask but I used this challenge to learn SQLalchemy which was one thing I always wanted an excuse to start looking at it.
I used the postgres docker image which has an entrypoint that allows to run scripts and SQL files on first run. More info of the Docker image at Initialization scripts and the schema which is embedded in a configMap used in the deployment manifest, the data to be populated is also embedded into a configMap but this time a job runs as a chart only once on post-installation.
API tasks
- Load de CSV into a PostgreSQL database
- Create an endpoint
/air_quality
that delivers the CSV data in a well structured JSON - Create a script/tool that creates an API Docker image. Bear in mind that the docker image will be deployed to the production environment at some point, so it must be ready to be deployed on Kubernetes, or any other container environment. Please, explain your deployment choice here.
- Provide a docker-compose environment: it must include a tool and instructions to load the CSV data the first time it runs.
CSV loading is done at the chart instead by the API.
CI has the steps to create the docker image and upload to Docker Hub. I used Kubernetes simply becuase I have years of experience and have my own cluster in a bare metal rented to Online.net.
Instead of a docker-compose to have a better experience and not having to do maintenance of two deployment scripts (docker-compose and Helm Chart) I prefer to teach how to use Kubernetes in Docker or Kind and let developers and other teams to install the Helm chart into a Kubernetes Cluster to have a similar environment in local and development/integration/staging/production environments.
CI (Continuous Integration)
Elaborate on how to build a CI within this project, in order to deploy the API using GitHub and a cloud provider of your choice.
Well... Here I had a missread. I did not used GitHub and deployed my own git server (gitea) and user my own CI (drone CI) in my personals Kubernetes cluster instead of Github and a Cloud provider.
I will completly understand I you take this challenge as failed at this point.
- A pull request in the master branch must be deployed to the production environment.
- A pull request to the staging branch must be deployed to the staging environment.
- A pull request to the development branch must be deployed to the dev environment.
I preferred to change this a bit. A push to master should deploy to production, not a pull request as a pull request could lead to untested code to be deployed to production.
A staging environment should be deployed in a pull request to master instead. Also all push to devel should deploy to a devel environment to start smoke tests and a environment for each pull request to devel to start integration tests.
Pull requests to devel are easy to implement but I have no hook in gitea when a PR is closed so each devel environment becomes orphan as fast a a new commit appears. I have to develop a tools that asks gitea for closed PRs to clean and use some kinf of annotation to find which release belongs to which PR.
Please explain in detail how this would work and any technology involved in the process.
CI has autotagging using
semantic-release
parsing commit messages to automatically bump versions.
Semantic release configuration configures which commit format I will use and which branch is for realeasing or doing a prerelease having multiple channels in the tag.
All tags are equally deployed to the environment of the channel it belongs to.
Deployment
The preferred choice here is Kubernetes, but it is not mandatory, so if you pick any other solutions, please explain your choice and why you chose it.
Requirements
- 5 x API containers
- 1 x Database container (redundancy is not mandatory)
- 1 x Cache container: it should be deployed as an API cache, so once a response is cached, subsequent requests will be served from cache content. Please elaborate your technology choice and any knowledge about caches and invalidation strategies. NOTE: changing the API code it’s not a requirement for implementing the cache layer.
- Provide YAML files (api, cache and db) for k8s, or the suitable configuration files for other technologies, along with instructions on how to use them.
- The provided solution must include all the required dependencies and detailed instructions for the deployment. i.e:
kubectl apply -f <file.yaml>
I only deploy 5 containers in stable channel by its
values-stable.yaml
.
Monitoring, logs and backup
Explain and/or provide the components/technologies required to cover a full monitoring, logs and backup system. An operator should be able to diagnose and detect issues with the tools provided.
Using a cloud provider abstracts the operator to operate the database as probably is usgin a Cloud Service like AWS Aurora so there is no operation in the database and AWS (at least) has a bare minimum tooling for backup, snapshot and monitor it.
The API server is stateless so a (not implemented as of I am writing this) liveness probe to the pod at/ping
to know if Flask is up and as all request that go to the logs try to look for spurious HTTP codes. Develop a proper metric API so this monitoring is made by Prometheus instead of the overhead of parsing logs.