Below are some of the interesting bugs that I encountered over the course of my software engineering career.
Hermes bytecode with dynamic function generation
Hermes engine compiles JavaScript functions into bytecode to optimize React Native app. However, some libraries depend on dynamic function generation to work.
While developing AI Simulator, I encountered this issue when using numjs and cwise library.
Initially I switched to jsc to walk around the issue. Then I tried my luck to raise an issue about this with hermes.
To my surprise, there is actually a way to disable hermes bytecode for specific functions:
1 2 3 4 5 |
function foo() { "show source"; return 1 + 1; } print(foo.toString()); |
So I patched up numjs to use this and it works with hermes engine.
tf.js gather error
After implementing PPO in tf.js, the code throws an error when computing gradient. The error occurs for the gradient associated with tf.gather.
There doesn’t seem to be anything wrong with the output of tf.gather, just that the gradient computation complains that the shape is wrong:
1 2 |
<span class="pl-c">Error: Invalid TF_Status: 3</span> <span class="pl-c">Message: Input to reshape is a tensor with 64 values, but the requested shape has 4096</span> |
So I decided to investigate if tf.gather has internal issue by writing my own gather function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
function gatherOwn(input: tf.Tensor, indices: tf.Tensor): tf.Tensor { const batchSize = input.shape[0]; const tensors = input.split(batchSize, 0); const output = tf.stack(tensors.map((tensor, i) => tensor.squeeze().gather(indices.arraySync()[i]))); return output; } const result = gatherOwn(input, indices); const grads = optimizer.computeGradients(() => { // computation using result } // No error, gradient computes as expected |
And it works perfectly fine without any errors.
So it was tf.gather that is not working as expected, and I opened an issue on tf.js.
NaN on iOS
When training DQN model with tf.js, I encountered an issue where the numbers are all NaN on iOS, but they work fine on Android.
After investigating, I found the NaN start appearing when updating gradients. So I looked into the Adam optimizer which updates the gradient. Turns out Adam optimizer needs an epsilon parameter for numeric stability, and it is a constant defined in the backend kernel by default.
For tf.js on React Native, it uses WebGL backend, and the default is 1e-7 for float 32 and 1e-4 for float 16. It does have some logic to detect if the OS supports float 32, but maybe it doesn’t work on iOS?
I manually set the epsilon parameter to 1e-4 and the NaN went away.
process.env undefined?
I have a test file that looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
const myFunction = () => { if (process.env.MY_ENV) { // doSomething(); } } afterEach(() => { process.env.MY_ENV = undefined; }); it('called when MY_ENV is set', () => { process.env.MY_ENV = 1; // assert doSomething called }) it('not called when MY_ENV is not set', () => { // assert doSomething not called }) |
The idea is that I want to test if the condition on the environment variable MY_ENV is working correctly. However, the second test when process.env.MY_ENV is undefined keeps failing. I tried to print out the value of process.env.MY_ENV, but it was showing “correctly” as undefined.
After debugging for half an hour, I decided to take a look at process.env directly. And I was shocked by what I found: process.env.MY_ENV is actually the string ‘undefined’ instead of the value undefined. So that’s why when I printed it out, it looked correct. It is completely unexpected that the assignment of undefined does not work for process.env when it works for everything else I had tried so far.
It was indeed raised as an issue in Node.js repo. And subsequently, there was documentation specifically added for this weird behaviour.
JavaScript / npm / Node
- Module resolution bugs
- hoisting and peer dependency
- npm vs yarn
- checking if a package is installed
- Node bug
LSEP Character on HTML in Windows Chrome
After adding a new post for my company’s engineering blog, I discovered that it is rendering a weird character on live website (at the end of some lines):
However, we didn’t notice that character when editing markdown in macOS.
After digging some resources, we found out it only shows up in Windows systems. The interesting thing is that even we cannot see the character in macOS, we can still select the character using the keyboard arrow keys and do a replace operation in our markdown files (thanks to yuan3y).
So we did a replace and fixed it with a PR that shows no changes on macOS: https://github.com/grab/engineering-blog/pull/16
Increasing Number of Repeated Logs – Sept 2017
When I was writing a new service in our Rails server to talk to another service, something strange happened. I noticed that the same log for requests sent by the service were getting printed to the console repeatedly for more than 10 times. And as time goes by, the number of repeated logs seems to be increasing.
When I first tried to approach the issue, I thought, “Maybe I didn’t follow the correct format as the other services?”
So I changed my new service structure to be exactly the same as the other services and tried sending request again. The log still appear multiple times. Maybe the hot reloading is not working? So I restarted the server and tried again. This time, the issue disappeared. “Great!”, I thought the issue was fixed.
However, a few hours later, it started happening again. Now it is printing duplicate logs 4 times, less than the initial 10 times, but still not solved.
Could it be due to multiple rails consoles that I opened? I restarted the server, tried sending request, opening another console, sending again, the log is only printed once.
Then when I moved on to write some code and come back to check the log a few hours later, it is getting printed multiple times again.
I decided to do some logging using ruby puts
method in the code to check where the multiple log printing originates from. After adding logging and restarting the server, I once again moved on to other things.
As expected, I see 2 repeated logs printed, but the code that calls the logger was only called once, so it must be a problem of the subscriber.
The moment of truth came when I removed my puts
from the code. After sending a request, I see 3 repeated logs, one more than the original.
Following this, I tried undoing the removal of puts
, and (un)surprisingly, now there are 4 repeated logs.
So it is the editing of the file that caused the log to be printed repeatedly? Then I came to the realization that it must be the hot reload. Something during the hot reload triggered the subscriber to re-subscribe.
Indeed, after inspecting the service, I noticed that the code to subscribe to logging event was outside the service class. So it is getting executed every time when the hot reload is being carried out.
Case closed.