Somax Hardware Testing and Qualification

New hardware is a normal part of building hi tech. In this area are some results and notes concerning perspective hardware

Arm Compute Engine

Device: Rockchips Rk3399Pro Rock960

Description: The Rk3399Pro is a raspberry pi style board with a hex-a-core 2x A72 + 4x A53 architecture. The Rock960 can be found at seeed studios for $99 2gb DDR3 / 16 gb eMMC or $139 for the same board and 4gb DDR3 / 32 gb eMMC. The board also includes a Mali T860 GPU and a NPU capable of 2.4 TOPS.

Justification: Somax, stated most simply, is a autonomous audio-visual capture system that incorporates AI, to direct capture and playback in a way useful to a humans. AI is computationally expensive and historically this does not bode well among mobile platforms. This changed in late 2018 when Huawei announced the Kirin 970. The 970 is not the first AI platform for ARM, but it is the fastest! It is reported to deliver 2.4 TOPS while using a little more than 1 watt of power. In January 2018, Rockchips anounced the Rk3399Pro. This board retails for a base pricew of $99 and has the same processor as the Kirin 970 which is at minimum $300. The Rk3399Pro based boards keep cost down by using slightly less able part like DDR3 RAM instead of DDR4. It is certainly worth having a look as this device be quite useful in a mobile platform such as Somax.

September 12, 2018

I hooked the board up and got it booted to Andriod without much effort. Took a bit more effort to get it booted on Ubuntu 16.04. One small snag was that flashing requires a USB-C cable.

Once booted to linux I was able to configure wifi and switch over from a hard Ethernet connection.

Next I tried to find some trace of a NPU. I looked all through /sys and /proc and found mention of the Mali GPU but nothing about an NPU. I had trouble running lspci on the board also was not able see the pci devices in /sys or /proc either. /proc/memio did have some mention of the PCI memory space but nothing else.

At this point I was getting convinced that the board had no NPU.

I checked the stats reported by the Tengine inference engine developed the Open AI Lab. The stats reported in the performance benchmarks does not mention the RK3399pro but instead different combinations of the ARM 72 & 53 cores. No mention of the NPU anywhere. Even if it were a mistake and the NPU were present the performance numbers seem off too. 51 & 56 milliseconds on sqeezenet and mobilenet respectively is really high for a device supposedly capable of 2.4 TOPS.

I stopped qualification at this point and contacted Seeed. A Rep. responded quickly and explained that a miscommunication was the problem. Another Rep had responded to the question "is this the rock960 with NPU" with the answer yes. turn out this is not true and should be corrected soon over at Seeed. The rep also mentioned that the rk3399pro will soon have a Seeed board. The code name for the coming board is the Rock970.

If the Rock970 has a price point closer to a Rock960 and less like a HiKey970 then I think I'm gonna bite and give it a try.

Final Qualification: Rejected: The Rock 960 from Seed Studio's has no NPU and is not based on the RK3399pro.

September 6, 2018

Board arrived from seeed. I will get pictures of the unboxing up pronto.

seeed Rock960 / RK3399Pro in boxes at the lab.

September 5, 2018

I have been on the prowl for compute engines that are low power and capable of copious amounts of inference. So far I have spec'ed the Intel Myriad 2 and X. The 2 is available in the form of the Neural Compute stick or the Up AI board with the same part Up also has a version in the pipe that has two Myraid 2 processors in effect doubling the GFLOPS. The Myraid X is not easy to get your hands on and I've been looking. Any of the other options like the Laci compute engine have yet to materialize as hardware.

Recently I said that I wasn't able to find a Raspberry Pi solution suitable for Somax needs. Well, that has changed. Kinda. I looked deeper and found that Raspberry Pi still did not have the needed capacity but and whoa daddy this is really good but, when you broaden your horizons to anything with an ARM processor then you get a few really good options! Qualcomm has the Snapdragon line, Huawei has the Kirin line, Xylinx has the 96 line and Rockchips has the Rock960 Pro line. Spec's looked good on all of them. I mean good as in "stop looking you've found what you need!" Some of these engines are topping out at 2.4 TOPS in 8 bit precision with 16 bit being the linearly scaled to 1.2 TOPS. For perspective, The NVidia Pascal line of GPGPU's handle 8-bit precision (all of this is integer mind you so it's still apples to apples) at 47 TOPS. A portable version of Pascal is found in the Jetson TX2 platform. The TX2 uses on average 7.5 watts of power but when it's really cranking on inference, and getting that 47 TOPS, it will use 13 watts. The ARM engines deliver 2.4 TOPS , for the exact same inference, in all other respects, then power and though put , which is about 1 watt (I'm estimating at 1.2 Watts until I can prove different) or .5 Watts per TOPS. Using the same math with respect to the TX2 finds the Pascal engine at .27 Watts per TOPS. Supposing we only need 2.4 TOPS, and if it scales linearly that would be about .66 Watts for the same everything, including power, that is costing 1.2 Watts or about double on ARM. On the basis of computation per Watt needed (not the maximum) the NVidia chip looks better. The down side of the NVidia platform is the price, The processing module is about $300 and it needs a carrier for another $300. The ARM solution can be had for $99 with 2 gigs DDR3-1866 / 16 gigs eMMC or $139 for a 4 / 32 gig combo of the same, with all the above (module included) integrated into the carrier. No math to do here, if you only need 2.4 TOPS then it will cost $99 for a bare bones that may due in most cases or $139 that will probably due for the standard deviation of well crafted algorithms. One price for the TX2 means that it is $600 for hardware that can sustain the required 2.4 TOPS. While I expect both claims to naturally be best-case, I expect it from both, so that is a wash in my opinion. As for Size this is another difference which favor the ARM solutions. The TX2 with carrier is 170x171x15 millimeters (module 50x87x??) while the ARM is 85x54 millimeters for everything. One fits in your pocket, the other is better left in the car!

I ordered a Rock960 from Seeed Studios which should have the RK3399Pro SOC and the 2.4 TOPS per Watt capability. It will be here this week and I am anxious to qualify this board with some actual wild inference!

Seed Rock960 / RK3399Pro in boxes at the lab.

HardwareCommentsLog