Thoughts from a life in software

Posts

Showing posts from February, 2023

Storage of Vectorisation and Effective Querying

February 26, 2023

A while back I got to play with vectorisation of terms using a transformer model & Facebook Fais and I think there is a much better way. The idea is each document (block of unstructured text) within a corpus (collection of documents), is tokenised (group of terms). For example, if we had the following document: The cat in the hat sat on the mat and drank milk from a jug. The child stared in alarm at the cat in the hat as that was his milk! In Natural Language Processing the first step is to remove stop words. Stop words are commonly used words within a language. They are typically used to join adjective, nouns, etc.. and so quickly dominate statistical analysis. For example "the" is a stop word. So removing stop words from our example gives the following: Cat hat sat mat drank milk from jug. Child stared alarm cat hat his milk Now we want to convert this into a series of tokens, the token size is dependent on your document (you don't want it to be larger than ...

Continuous Integration is an organisation problem

February 15, 2023

After 10 years in DevSecOps, there is an assumption that that every project is unique and needs to deploy their own Continuous Integration (CI) instance and write their own Continuous Integration / Continuous Deployment (CI/CD) pipelines However within a CI Pipeline you should be producing a build artefact which is supplied into a CD pipeline. Many CD Pipelines can be triggered by various means. This allows you to manage CI and CD pipelines separately. While the mechanism and configuration of a deployed product can vary greatly between software products the CI pipelines face the same constraints which mean each project implementation is highly limited in the process it must implement. This blog will outline the reasoning behind the last statement There are only so many build systems Modern software languages have build automation systems and Dependency Management Systems, these aim to automate the mundane tasks in building, testing and releasing a software project. The ...