Posts
-
The Meaning of ACID Consistency
For backend/full-stack software engineers and system design interviews, I highly recommend “Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems” by Dr. Martin Kleppmann.
I want to share how this book explains ACID consistency. ACID is a set of properties intended to guarantee data validity after any transaction, which can sometimes be composed of multiple database operations. A transaction satisfies the ACID properties if it
- completely fails, returning the database to its original state by discarding all writes, when any of its statements fail to complete (atomicity),
- transitions the database from one valid state to another (consistency),
- does not conflict with concurrent execution of other transactions (isolation), and
- remains committed regardless of hardware faults, database crashes, etc. (durability).
In my experience, when ACID consistency is explained, the focus is usually on defining integrity constraints in order to prevent database corruption (see https://en.wikipedia.org/wiki/ACID).
However, an application’s notion of consistency may require enforcing constraints that the database cannot. So, at the end of the day, it is actually the application that is responsible for preserving consistency by defining its transactions correctly. For this reason, Dr. Kleppmann concludes that consistency is actually a property of the application, unlike atomicity, isolation, and durability, and that “the letter C doesn’t really belong in ACID” (page 225).
-
Walking through the quicksort subroutine
As discussed previously, quicksort uses the following subroutine, implemented in JavaScript, to rearrange the elements in an array.
partition.test.js
const expect = require('chai').expect; const partition = require('./partition'); describe('partition', () => { it('rearranges `arr` from [46,50,99,32,49] to [46,32,49,50,99] and returns 2', () => { const arr = [46,50,99,32,49]; const p = 0; const r = arr.length - 1; const actual_q = partition(arr,p,r); const expected_arr = [46,32,49,50,99]; const expected_q = 2; expect(expected_arr).to.deep.equal(arr); // in-place sorting expect(expected_q).to.equal(actual_q) }); });
partition.js
module.exports = (arr, p, r) => { let i = p - 1; for (let j = p; j < r; j++) { if (arr[j] <= arr[r]) { i += 1; exchange(arr, i, j); } } const q = i + 1; exchange(arr, q, r); return q; } // helper function const exchange = (arr, i, j) => { const temp = arr[i]; arr[i] = arr[j]; arr[j] = temp; }
Now, we will walk through what the subroutine does for the unit test above. Note that
p
is used to set the initial values ofi
andj
,- inside the loop, the values located at
i
(incremented before the exchange) andj
are only exchanged ifarr[j]
is less than or equal toarr[r]
, as is the case whenj = 0
andj = 3
in this example, and, - after the loop, the values located at
q
(set toi + 1
) andr
are exchanged.
-
An important subroutine used by quicksort
Quicksort is an in-place sorting algorithm that uses an important subroutine for partitioning. The following JavaScript code implements quicksort.
const partition = require('./partition'); const quicksort = (arr, p, r) => { if (p < r) { const q = partition(arr, p, r); quicksort(arr, p, q - 1); quicksort(arr, q + 1, r); } } module.exports = quicksort;
This algorithm uses the divide-and-conquer approach, which divides the original problem into multiple problems that are similar but smaller, conquers these sub-problems by solving them recursively, and then combines all the solutions to the sub-problems to create a solution to the original problem.
Quicksort uses the
partition
subroutine to divide the original problem. It computes the indexq
and rearranges the arrayarr[p..r]
such that all the values in the array with a smaller index thanq
, thearr[p..q-1]
sub-array, are less than or equal to the valuearr[q]
. On the other hand, all the values that have an index greater thanq
, thearr[q+1..r]
sub-array, are greater thanarr[q]
. Here is an implementation of this subroutine.partition.js
module.exports = (arr, p, r) => { let i = p - 1; for (let j = p; j < r; j++) { if (arr[j] <= arr[r]) { i += 1; exchange(arr, i, j); } } exchange(arr, i + 1, r); return i + 1; } // helper function const exchange = (arr, i, j) => { const temp = arr[i]; arr[i] = arr[j]; arr[j] = temp; }
After the division comes the conquest. So, the sub-arrays
arr[p..q-1]
andarr[q+1..r]
are then sorted by recursive calls to quicksort. Since everything was done in place, the entire arrayarr[p..r]
is sorted after these recursive calls and no additional work is needed to combine all the solutions to the sub-problems. -
Checking for Integer Divisibility Using Binary Search
In a previous post, we proved that if $a = qd$ for integers $a, q, d$ such that $d \neq 0$, then $ \lvert a \rvert \geq \lvert q \rvert$. So, if we can find an integer $q$ in $[-a, a]$ such that $a = qd$, then we can say that $d$ divides $a$. The approach we used had a linear time complexity — some function of $2a + 1$ — because we naïvely checked each possible $q$ within that range.
However, we can find $q$ much faster by using binary search. The following solution has a time complexity of $\mathcal{O}(\log a)$. It passes all the same test cases as our previous approach. Note that we are still not using the division operator here; in order to find
mid
, we use bit shifting.Some thought should be given to numeric overflow when $q$ and $d$ are large numbers.
divides.test.js
const expect = require('chai').expect; const divides = require('./divides'); describe('divides', () => { // negatives it('returns true for -14 divides 28', () => { const expected = true; const actual = divides(-14, 28); expect(expected).to.equal(actual); }); it('returns false for -14 divides 13', () => { const expected = false; const actual = divides(-14, 13); expect(expected).to.equal(actual); }); it('returns true for 9 divides -18', () => { const expected = true; const actual = divides(9, -18); expect(expected).to.equal(actual); }); it('returns false for 32 divides -2', () => { const expected = false; const actual = divides(32, -2); expect(expected).to.equal(actual); }); it('returns true for -4 divides -56', () => { const expected = true; const actual = divides(-4, -56); expect(expected).to.equal(actual); }); it('returns false for -5 divides -99', () => { const expected = false; const actual = divides(-5, -99); expect(expected).to.equal(actual); }); // positives it('returns false for 15 divides 25', () => { const expected = false; const actual = divides(15, 25); expect(expected).to.equal(actual); }); it('returns true for 9 divides 72', () => { const expected = true; const actual = divides(9, 72); expect(expected).to.equal(actual); }); it('returns false for 27 divides 9', () => { const expected = false; const actual = divides(27, 9); expect(expected).to.equal(actual); }); // zeroes it('returns true for 13 divides 0', () => { const expected = true; const actual = divides(13, 0); expect(expected).to.equal(actual); }); it('returns false for 0 divides 42', () => { const expected = false; const actual = divides(0, 42); expect(expected).to.equal(actual); }); it('returns false for 0 divides 0', () => { const expected = false; const actual = divides(0, 0); expect(expected).to.equal(actual); }); // ones it('returns true for 1 divides 8', () => { const expected = true; const actual = divides(1, 8); expect(expected).to.equal(actual); }); it('returns false for 13 divides 1', () => { const expected = false; const actual = divides(13, 1); expect(expected).to.equal(actual); }); it('returns true for 1 divides 1', () => { const expected = true; const actual = divides(1, 1); expect(expected).to.equal(actual); }); });
divides.js
module.exports = (d, a) => { // some edge cases if (d == 0) { return false; } if (a == 0 || d == 1) { return true; } if (Math.abs(d) <= Math.abs(a)) { // carefully choosing our left and right indices cuts our // search range in half from [-a, a] to [-a, 0) or (0, a] let left; let right; if (a < 0) { if (d < 0) { left = 1; right = -a; } else { left = a; right = -1; } } else { if (d < 0) { left = -a; right = -1; } else { left = 1; right = a; } } // binary search while (left + 1 < right) { const mid = (left + right) >> 1; const prod = mid * d; if (a == prod) { return true; } if (d < 0) { if (a < prod) { left = mid; } else { right = mid; } } else { if (a < prod) { right = mid; } else { left = mid; } } } } return false; }
-
Custom Integer Division Function
If we are not allowed to use the division operator in a programming language, how can we perform integer division? The following code uses repeated subtraction to achieve this; its time complexity is Big O of the quotient. One interesting tidbit:
NaN
is the only value in JavaScript that does not equal itself, so the last unit test had to be adjusted to account for this.integerDivision.test.js
const expect = require('chai').expect; const integerDivision = require('./integerDivision'); describe('integerDivision', () => { it('returns 0 for 2 divided by 7', () => { const expected = 3; const actual = integerDivision(7, 2); expect(expected).to.equal(actual); }); it('returns 0 for -4 divided by 9', () => { const expected = 0; const actual = integerDivision(-4, 9); expect(expected).to.equal(actual); }); it('returns 0 for 3 divided by -11', () => { const expected = 0; const actual = integerDivision(3, -11); expect(expected).to.equal(actual); }); it('returns 0 for -19 divided by -21', () => { const expected = 0; const actual = integerDivision(-19, -21); expect(expected).to.equal(actual); }); it('returns 4 for 44 divided by 10', () => { const expected = 4; const actual = integerDivision(44, 10); expect(expected).to.equal(actual); }); it('returns -4 for -21 divided by 5', () => { const expected = -4; const actual = integerDivision(-21, 5); expect(expected).to.equal(actual); }); it('returns -7 for 35 divided by -5', () => { const expected = -7; const actual = integerDivision(35, -5); expect(expected).to.equal(actual); }); it('returns 3 for -23 divided by -6', () => { const expected = 3; const actual = integerDivision(-23, -6); expect(expected).to.equal(actual); }); it('returns 1 for -1 divided by -1', () => { const expected = 1; const actual = integerDivision(-1, -1); expect(expected).to.equal(actual); }); it('returns 1 for 1 divided by 1', () => { const expected = 1; const actual = integerDivision(1, 1); expect(expected).to.equal(actual); }); it('returns -1 for 1 divided by -1', () => { const expected = -1; const actual = integerDivision(1, -1); expect(expected).to.equal(actual); }); it('returns 0 for 0 divided by 13', () => { const expected = 0; const actual = integerDivision(0, 13); expect(expected).to.equal(actual); }); it('returns -0 for 0 divided by -23', () => { const expected = -0; const actual = integerDivision(0, -23); expect(expected).to.equal(actual); }); it('returns Infinity for 41 divided by 0', () => { const expected = Infinity; const actual = integerDivision(41, 0); expect(expected).to.equal(actual); }); // NaN is the only value is javascript that does not equal itself it('returns NaN for "a" divided by 9', () => { const expected = NaN; const actual = integerDivision("a", 9); expect(isNaN(actual)).to.equal(true); }); });
integerDivision.js
module.exports = (a,b) => { // Return NaN if 'a' or 'b' are not numbers if (isNaN(a) || isNaN(b)) return NaN; // Return Infinity like the JavaScript // division operator does when 'b' is 0 if (b == 0) return Infinity // Use this bool to return a negative result // when 'a' or 'b' and not both are negative let negative = false; // If negative, replace 'a' with its absolute // value and toggle the 'negative' bool, then // do the same for 'b' if (a < 0) { a = a * -1; negative = !negative; } if (b < 0) { b = b * -1; negative = !negative; } // Count the number of time that 'b' goes into 'a' let count = 0; while (a >= b) { a = a - b; count++ } return negative ? -count : count; }
-
Checking for Integer Divisibility
Suppose that we are not allowed to use the division operator or remainder operator in a programming language such as JavaScript. With this constraint, how can we check whether or not an integer evenly divides another? Consider the following approach.
Definition. Let $a$ and $d$ be integers. We say that $d$ divides $a$ if there exists an integer $q$ such that $a=qd$.
Proposition. Let $a, q, d$ be integers such that $d \neq 0$. If $a = qd$, then $ \lvert a \rvert \geq \lvert q \rvert$.
Proof. Since $ a=qd $ implies that $\lvert a \rvert = \lvert qd \rvert = \lvert q \rvert \cdot \lvert d \rvert \geq \lvert q \rvert $, then $\lvert a \rvert \geq \lvert q \rvert$. Thus, if $a = qd$, then $\lvert a \rvert \geq \lvert q \rvert$. $\blacksquare$
So, in order to check whether $d$ divides $a$, a computer program would just have to search for an integer $q$ between $-a$ and $a$ (inclusive) such that $a = qd$. The following approach has linear time complexity, a function of $a$; I would describe it as being naïve or brute force.
Some thought should be given to numeric overflow when $q$ and $d$ are large numbers.
divides.test.js
const expect = require('chai').expect; const divides = require('./divides'); describe('divides', () => { // negatives it('returns true for -14 divides 28', () => { const expected = true; const actual = divides(-14, 28); expect(expected).to.equal(actual); }); it('returns false for -14 divides 13', () => { const expected = false; const actual = divides(-14, 13); expect(expected).to.equal(actual); }); it('returns true for 9 divides -18', () => { const expected = true; const actual = divides(9, -18); expect(expected).to.equal(actual); }); it('returns false for 32 divides -2', () => { const expected = false; const actual = divides(32, -2); expect(expected).to.equal(actual); }); it('returns true for -4 divides -56', () => { const expected = true; const actual = divides(-4, -56); expect(expected).to.equal(actual); }); it('returns false for -5 divides -99', () => { const expected = false; const actual = divides(-5, -99); expect(expected).to.equal(actual); }); // positives it('returns false for 15 divides 25', () => { const expected = false; const actual = divides(15, 25); expect(expected).to.equal(actual); }); it('returns true for 9 divides 72', () => { const expected = true; const actual = divides(9, 72); expect(expected).to.equal(actual); }); it('returns false for 27 divides 9', () => { const expected = false; const actual = divides(27, 9); expect(expected).to.equal(actual); }); // zeroes it('returns true for 13 divides 0', () => { const expected = true; const actual = divides(13, 0); expect(expected).to.equal(actual); }); it('returns false for 0 divides 42', () => { const expected = false; const actual = divides(0, 42); expect(expected).to.equal(actual); }); it('returns false for 0 divides 0', () => { const expected = false; const actual = divides(0, 0); expect(expected).to.equal(actual); }); // ones it('returns true for 1 divides 8', () => { const expected = true; const actual = divides(1, 8); expect(expected).to.equal(actual); }); it('returns false for 13 divides 1', () => { const expected = false; const actual = divides(13, 1); expect(expected).to.equal(actual); }); it('returns true for 1 divides 1', () => { const expected = true; const actual = divides(1, 1); expect(expected).to.equal(actual); }); });
divides.js
module.exports = (d, a) => { // some edge cases if (d == 0) { return false; } if (a == 0 || d == 1) { return true; } if (Math.abs(d) <= Math.abs(a)) { let start = -a; let end = a; if (a < 0) { start = a; end = -a; } for (let q = start; q <= end; q++) if (a === q * d) return true; } return false; }
-
Node.js TDD Project
According to Wikipedia, test-driven development (TDD) is a software development process that converts software requirements into test cases before the software is actually developed. The steps of the TDD cycle are as follows:
- Add a test (the new test should fail)
- Write the simplest code that makes the new test pass (YAGNI)
- Refactor the code
So, in order to set up a Node.js project for TDD, we need a way to run unit tests on the code that will be developed for it. There are two popular tools for doing this: Mocha is a popular JavaScript test framework and Chai is an assertion library that can be nicely paired with it.
We can get started by launching a bash shell and running the following commands to initialize such a project.
mkdir tdd-project cd tdd-project npm init -y npm install --save-dev mocha npm install --save-dev chai
The
tdd-project
directory should now have a new file calledpackage.json
which contains the following:{ "name": "tdd-project", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1" }, "keywords": [], "author": "", "license": "ISC", "devDependencies": { "chai": "^4.3.4", "mocha": "^9.1.3" } }
Now all we have to do is modify the
scripts
property in this file. We could simply make thetest
script run themocha
command. However,mocha
by default only finds test files with extensions.js
,.mjs
, and.cjs
in a directory namedtest
. On the other hand, the following script finds all test files ending with.test.js
in all directories exceptnode_modules
.{ "name": "tdd-project", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "test": "mocha \"./{,!(node_modules)/**/}*.test.js\"" }, "keywords": [], "author": "", "license": "ISC", "devDependencies": { "chai": "^4.3.4", "mocha": "^9.1.3" } }
After adding the
test
script, we can then run all tests by callingnpm test
in the project directory. To get started with writing test files, we create a new file somewhere within the project (not in thenode_modules
directory of course) namedsample.test.js
. Then we add the following contents:const expect = require('chai').expect; describe('sample', () => { it('fails this test', () => { expect(true).to.equal(false); }); });
Running
npm test
should now return the following:> tdd-project@1.0.0 test > mocha "./{,!(node_modules)/**/}*.test.js" sample 1) fails this test 0 passing (2ms) 1 failing 1) sample fails this test: AssertionError: expected true to equal false + expected - actual -true +false at Context.<anonymous> (sample.test.js:5:19) at processImmediate (node:internal/timers:464:21)
Congratulations on writing your first failing test! I have placed all the code in this post in a Github repository for your convenience: https://github.com/miguelamezola/tdd-project.git
-
Numbers Everyone Should Know
These performance numbers were put out by Jeff Dean more than 10 years ago. Since hardware has improved, they are now outdated. However, they still give us some idea of how the operations compare to one another. Simon Eskildsen has collected updated numbers in https://github.com/sirupsen/napkin-math.
Operation Latency (ns) L1 cache reference 0.5 Branch mispredict 5 L2 cache reference 7 Mutex lock/unlock 25 Main memory reference 100 Compress 1K w/cheap compression algorithm 3,000 Send 2K bytes over 1 Gbps network 20,000 Read 1 MB sequentially from memory 250,000 Round trip within same datacenter 500,000 Disk seek 10,000,000 Read 1 MB sequentially from disk 20,000,000 Send packet CA->Netherlands->CA 150,000,000 -
Python defaultdict collection
In python,
defaultdict
anddict
are almost the same except that the former never raises aKeyError
. Instead, it provides a default value for the key that does not exist. In the code below,seen
is adefaultdict
that haslist[int]
as its default value.from typing import List from collections import defaultdict def two_sum(nums: List[int], target: int) -> List[int]: seen = defaultdict(list[int]) for j in range(len(nums)): complement = target - nums[j] if complement in seen: return [seen[complement][0],j] seen[nums[j]].append(j) return []
On the other hand, using
dict
would look something like this.from typing import List def two_sum(nums: List[int], target: int) -> List[int]: seen = dict() for j in range(len(nums)): complement = target - nums[j] if complement in seen: return [seen[complement][0], j] if nums[j] in seen: seen[nums[j]].append(j) else: seen[nums[j]] = [j] return []
I like the first implementation better for its conciseness and human readability.
-
Multics Simulator
Multics was a mainframe timesharing operating system that began in 1965 and was used until 2000. Today I found out that there’s a simulator that can create a Multics machine and that it can even run on a Raspberry Pi! https://multicians.org/
-
Acceptance test studio project
I really like Cucumber, a single tool for automated tests and living documentation, using executable specifications written in plain text. Since I’m searching for ways to enhance my portfolio, I think it would be fun to make an application that can edit and run these specifications from a web browser.
-
A new focus on personal projects
I really need to focus on completing personal projects. My portfolio has always needed improvement and I have just been pushing it off. I think I’ll start scheduling time to work on this every week and discipline my self to follow the plan.
-
AdaBoost model near 0.95 training accuracy
Tuning the number of estimators, learning rate, and max depth for the base estimator tree has improved the model. Now, I need to investigate whether or not that the model is over-fitting the training data. New submission to the Kaggle competition is in the works.
-
Tweaking AdaBoost hyperparameters
I have to admit, yesterday I had already given up on using AdaBoost on the MNIST data set. But I had only been adjusting its n_estimators and learning_rate hyperparameters. So, I decided to try using a large number of estimators (200 - 1000), tweaking the learning rate (0.2,0.25,0.3,0.35), and changing the classification tree max depth for the base estimator (1,2,3).
from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score X_train, X_val, y_train, y_val = train_test_split(X, y) n_trees = [200, 400, 600, 800, 1000] depths = [1,2,3] rates = [0.2,0.25,0.3,0.35] for n_estimators in n_trees: ada_clf.n_estimators = n_estimators; for depth in depths: ada_clf.base_estimator = tree(random_state=619,max_depth=depth) for learning_rate in rates: ada_clf.learning_rate = learning_rate ada_clf.fit(X_train, y_train) y_pred = ada_clf.predict(X_val) accuracy = accuracy_score(y_val, y_pred) print(n_estimators,depth,learning_rate,accuracy)
The results are promising: this model blew past the training accuracy scores attained by the CART classification tree I used a few days ago on this data set. The best model has an accuracy of 0.896 (rounded to the nearest thousandth); using 800 estimators, a max depth of 3, and a 0.20 learning rate.
-
AdaBoost approach to MNIST
Three days ago, I trained a CART classification tree model on the MNIST data set. I submitted its predictions to the Digit Recognizer competition on Kaggle, which uses this data set. The submission received an accuracy score of 0.86346.
So, I was expecting for this AdaBoost model that I’m training today to perform better; its base algorithm is also a classification tree. Surprisingly, it hasn’t received an accuracy score greater than 0.81 on the training set. I think this indicates that tree-based algorithms do not perform well on this data set.
-
Cricket chirping correlated to ambient temperature
I never knew that crickets chirp faster as their ambient temperature rises but it’s true! This could make for an awesome elementary school science project; i.e. the student can collect cricket data and then use a machine learning algorithm to predict the temperature given a chirp rate or vice versa.
-
My first Kaggle competitions
Over the weekend, I joined two popular knowledge competitions on Kaggle, Titanic - Machine Learning from Disaster and Digit Recognizer.
I was very familiar with the first one, since I had implemented the CART algorithm in college and used this competition’s dataset to compare my implementation’s performance with that of scikit-learn Decision Trees. The second competition caught my eye because of its use of the famous MNIST dataset.
I’m really excited about the Jupyter Notebooks environment offered by Kaggle. Here are my notebooks for these competitions: