Ubuntu 22.04 VNC stops working

My Ubuntu 22.04 virtual machine has been constantly turning off (crashing?) the VNC server. This is the VNC server that is built-in under Settings -> Sharing -> Remote Desktop -> Enable Legacy VNC Protocol.

One way to restart the VNC server is to uncheck the box, then re-check the box.

However, since that has to be done with a GUI, it is really annoying.

So, here is the script that will stop and then restart the VNC server:

#!/bin/env bash

grdctl vnc disable
sleep 5
grdctl vnc enable

Using this script, once your VNC server crashes, you can ssh into the machine, run the script from the command line, and then re-establish your VNC session.

Another tip: be very careful when closing the “Remote Desktop” dialog, because the “X” to close that dialog is right over the red sliding “Sharing” on/off control. Be careful that you don’t accidentally click again, which will disable all Sharing when closing the “Remote Desktop” dialog.

Posted in Ubuntu | Comments Off on Ubuntu 22.04 VNC stops working

Kaggle, JSON, Python, and pandas

While looking at various datasets from kaggle to do some experiments with Python and graph visualization with networkx, the arXiv dataset caught my attention. It has a (relatively) simple schema – authors, papers, and categories. It is also huge – there are 2,219,423 lines/records in its 3.5GB uncompressed .json file (1.2GB compressed).

The format of that .json file is also a practical example of an issue with JSON – you can’t start with a valid .json file, then add/write/append another record, and end up with valid JSON. (The CSV format is allows appending without rewriting the entire file, but CSV has many other issues that make it unattractive for most cases.)

There is a (semi-formal?) standard for the format used by arXiv (even though it is never mentioned by name) – “JSONL” or “JSON Lines”. See this 2022 entry from atatus.com and this jsonlines.org summary. The vocabulary is fun: “Each line has a valid JSON value”. Other vocabulary: “Line-delimited JSON” and ” Newline-delimited JSON” and “Concatenated JSON”. The suggested extension is “.jsonl”

Tool support for JSONL seems to be hit-or-miss. jq supports JSONL (surprise!). The Python library json does not (it throws a JSONDecodeError “Extra data: line 2 column 1 (char 1689)”. As the title says – this post is about pandas.

Based on many of the code examples, you could easily conclude that pandas does not support JSONL. For a naive implementations, see Arxiv Data analysis (loads every object into an array, then creates the DataFrame with pd.DataFrame.from_records(array). The people who run this naive code have either (a) trimmed the number of lines in their input .json to something reasonable, like 100k lines or (b) have access to machines with 24GB+ of RAM in order to have two copies in memory at the same time.

The github gist that inspired this post was from cj2001/load_arxiv_data.py, which creates the entire array of “metadata”, but with a limit on the number of lines/objects created. It does limit the number of lines, but it is still “the hard way”.

For a slightly more sophisticated implementation, see Scientific Article Recommendation – See In[2] extract_data with “yield line” and In[3] fetch_n_records with list comprehension to load the file into memory:

return [json.loads(record) for record in islice(data_gen, chunksize)]

In conclusion:

The correct pattern is to use pd.read_json( filename, lines=True, orient=’records’, nrows=nnn) (as can be seen in ArXiv) Note that leaving off the “nrows=” argument can create an out-of-memory situation even when using nrows=<a huge value> will work just fine (e.g. arXiv, nrows=5000000 works, since there are only 2219423 lines, but leaving off nrows crashed with OOM killer)

So – add pandas.read_json() to the list that supports JSONL natively:

df = pd.read_json("arxiv-metadata-oai-snapshot.json",
                  nrows = 5000000,
                  orient='records',
                  lines=True )

Posted in Machine Learning | Comments Off on Kaggle, JSON, Python, and pandas

Vue.js 2 versus 3

As of this post creation, Vue.js version 3 is available, and is “being promoted” over version 2. For example, the default “npm install -g @vue/cli” install will use Vue 3.2.41 by default.

Don’t use version 3 at this time.

The reason: Vue.js v3.0 was released 2020/Sep. But Vue.js v3.2 was released 2021/Aug, and there is some sort of “architecture war” (negative spin) or “massive improvement” (positive spin) happening with v3.2

Specifically – v3.2 has introduced <script setup> coding style, which results in “… difference in module execution semantics”. The documentation and examples for <script setup> is very thin, and seems to (typically) assume you already know how to use it while it is being explained.

Just stick with Vue.js (v2.5 is what I’m using in “compatibility mode” with “vue init” from the Vue CLI tool suite. After my application is fully functional, I’ll try to upgrade to v2.7 (which is EOL 2023/Dec; hopefully a year from now v3.2 will have caught up in its documentation.)

Posted in Software Engineering, Software Project | Comments Off on Vue.js 2 versus 3

Benchmark Disks NVMe vs SSD SATA vs HDD SATA

Benchmarked with Samsung Magician

DeviceSeq Read
MB/s
Seq Write
MB/s
Random Read
IOPS
Random Write
IOPS
Samsung 980 Pro
1TB NVMe
69205174984130297607
Samsung 980 Pro 2TB NVMe700752401053222924072
Samsung 980 Pro 2TB NVMe
when plugged into a PCIe 4.0 x2 mode
35613514852294835205
Samsung 850 Pro
256GB SATA
5605269497081298
WDC 2003FZEX
2TB SATA
111115238244
Samsung 970 Pro 512GB NVMe35592299362304364476
Intel 660P 2TB NVMe18391938234130226806
Samsung 860 Evo 1TB SATA5615319716786669
WD Red WD40EFRX 4TB SATA146140244488*

Note: the two 980 2TB rows show an important difference: “PCIe 4.0 x4 mode” versus “PCIe 4.0 x2 mode”. The first 980 1TB is also plugged into an “x4 mode”.

Posted in Computer Builds, Hardware | Comments Off on Benchmark Disks NVMe vs SSD SATA vs HDD SATA

Core i5 Update

This is a rebuild of the Core i5 Sandy Bridge Build machine. This machine also started spontaneously shutting off, have BSOD and freeze events. (Just like what caused the Core i5-10400 rebuild, previously).

The CPU, motherboard and RAM were changed out. The old motherboard was USB 2 with DDR3; the new motherboard is USB 3 with DDR4, and got a bump from 16GB to 32GB

Some facts on the new CPU: it is currently #279 on PassMark [20,970] cpubenchmark.net, with a turbo speed of 4.8 GHz and TDP of 65W to 117W. It is 6 cores/12 threads, first seen Q1 2022.

For now, the old GTX 970 Video card is still being used (see Video Card Relative Benchmarks) – it is on its last legs, though, and needs upgraded too.

Next step: Windows was no longer activated (new hardware, which is expected). What is different is that the “New Hardware Installed” button is completely borked. It tried to log in to Microsoft, which changed my local account into the “linked email account”. In the end, I just decided to get a new OEM Windows 10 and a new SSD NVMe (because at this point, this system is begging for it).

Update: The old Optiarc DVD drive was not recognized by the motherboard/BIOS as a valid boot device. So – swapped in a newer drive to get Windows installed. Update: splurged, purchased a Blu-ray burner with M-DISC support. Update: M-DISC blanks from Amazon arrived – true M-DISC 25x for $65, regular BD-R 25x for $22. Both are 25GB size.

Update: Bought a RTX 3080Ti for $900 (on sale from $1,200)

ItemProductCost
CPUIntel Core i5 12600 3.3GHz LGA 1700 65W$230
RAMCorsair Vengeance RGB Pro 32GB (2x16GB) DDR4 3600 PC4-28800$130
MotherboardASUS PRIME B660-Plus D4 LGA 1700 2.5Gb LAN, Gen 1 Type-C$140
Power SupplyCorsair 750CX ATX 12V 80 Plus EPS12V($60)
VideoNVIDIA GeForce RTX 3080 Ti 12GB GDDR6X, PCIe x4.0, 10,240 CUDA cores$900
CaseAntec Three Hundred ATX Mid Tower($60)
SS DriveSamsung 980 Pro 1TB PCIe Gen 4×4 NVMe, 7000MB/sec, M2.2280$140
HD DriveWestern Digital Black 1TB SATA 6.0Gb/s 7200RPM 64MB WD1002FAEX($88)
DVD/CDASUS BW-16D1HT Blu-ray burner with M-DISC support$80
OSWindows 10 Professional 64 bit$130
KeyboardCorsair Vengeance K70 Mechanical Gaming Keyboard – Red LED – Cherry MX Brown Switches($130)
MouseLogitech G5 2-Tone 6 Buttons 1 x Wheel USB Wired Laser 2000 dpi Mouse($46)
Total$

Posted in Computer Builds, Core-i5 | Comments Off on Core i5 Update

HP Workstation

The goal of this machine to experiment with mid-range video editing and/or be a monster Virtual Machine Server. It was refurbished, from PC Server Parts Certified Refurbished. It kept going in and out of stock. It arrived in a big box that was well packed with foam, and weighed in around 70 lbs. Once unpacked, the computer itself weighed in at 52 lbs. exactly.

Some facts on the CPU: the Intel Xeon E5-2680 v2 @ 2.80GHz is currently #537 on PassMark [12,530] [single thread 1,789] and is a FCLGA2011 at 115W. It first appeared on the charts 2013/Q2. It is a 10 core 20 thread model with turbo speed of 3.6GHz. The “Dual” PassMark for the E5-2680v2 is 20,678, which is #190 overall rank.

Notes on Ubuntu. The process starts with the 2.5″ to 3.5″ converter and a 1TB SSD. The HP has really nice 3.5″ internal bays, with a SATA backplane already wired for power and data. Just slide the new one in, slide out the 500GB SSD, and boot to the installation ISO. First challenge: WiFi did not “create the device” in the GUI, even though the device was recognized. To fix that, just push “+” in Network Connections, and the “Wi-Fi” section shows up. Then, in the upper-right, the wifi menu appears with “Select Network”.

Second challenge – Ubuntu Firefox fails to start – “serial 467 error_code 11 “BadAlloc”. (Hand installing chrome, that fails too with “gbm_wrapper.cc failed to export buffer to dma_buf“). As a temporary fix, start Firefox by hand with “firefox -safe-mode”. For the final fix, enter “Software & Updates”, go to tab “Additional Drivers”. Under NVIDIA Corporation GF100GL [Quadro 4000], the current selected driver is “Using X.Org X server – Nouveau display driver from xserver-xorg-video-nouveau”. Change that to the other option “Using NVIDIA driver metapackage from nvidia-driver-390 (proprietary, tested)”. Restart. Now firefox and google-chrome-stable both work. (Note: abandoned efforts to get NVIDIA “390” drivers – via “NVIDIA-Linux-x86_64-390.147.run” – to install. If you are reading this because you have a Quadro 4000 driver under linux, you may need this additional step.)

All product links are from the actual vendor.

ItemProductCost
SystemHP Z820 Workstation (Cached PDF)$1,289
CPU2x Intel Xeon E5-2680 v2 2.8Ghz, 10 cores eachincl.
RAM256GB (8x 16GB) DDR3 DRAMincl.
Motherboardincl.
Power Supplysingleincl.
Video2x  Nvidia Quadro 4000 2GB, 1x DVI, 2x DPortincl.
DVD/CDYesincl.
Caseincl.
SSD Drive512GB
HHD Drive2000GB Hitachiincl.
Monitor
Wi-FiTP-Link AC1200 PCIe Dual Band$34
Keyboard/
Mouse
OSWindows 10 Proincl.
Software
Total$1,323
SSDSAMSUNG 870 EVO Series TB SATA 2.5″$138
ConverterSabrent 2.5″ HDD to Desktop 3.5″ Converter$13
OSUbuntu 22.04incl.

Posted in Computer Builds, Hardware | Comments Off on HP Workstation

Amazon S3 ETag Advanced Information

You are probably here because you looked at one of your S3 object’s ETag, and it had a dash character (“-“) in it. Most of your other ETag values are simple and correct md5sum hashes. But this one is weird.

Or, you’re here because one of your S3 object’s ETag ends with “-2”, and you’ve looked up multipart, and you’ve seen the multipart documentation around “multipart_threshold’ and ‘multipart_chunksize’, so you know that “-2” means the ETag was computed as two (2) chunks. But things are still not working out.

Or, you’re here because you know that “-2” means two (2) chunks, and you know the default chunk size is 8MB (8*1024*1024 bytes). Which is all super, except the object is 18MB in size – and 8+8 is only 16 – surely S3 is not throwing away chunks? What is going on here?

The TL;DR answer is – S3 uses both 8MB and 16MB as the “default” chunk size (and, I assume, 32MB, 64MB, etc. Once you break the rules, nothing stops you from doing it again.) As a concrete example – the object size was 17,325,568 bytes and the ETag was “c44bfa98b2c188777ed18cb9190e304b-2”. I used aws cli (aws-cli/2.0.50 Python/3.7.3 Linux) for this upload, so it should have used 8MB chunks, which means the ETag should end in “-3”, not “-2”. Running the code (below) shows that 16MB chunks creates a matching ETag using the local file.

I used “calculate_s3_etag” from this stackoverflow post by hypernot [which seems to be in github – but I used the stackoverflow code, not the github code]. I have confirmed the stackoverflow code works with my 30,000+ files – after trying 8MB, then trying 16MB – to compute the ETag from a local file.

Other references:

  • Seems to indicate only 8MB and 16MB are chunk sizes (8MB for aws cli aka boto3, and 16MB for s3cmd). Since I’ve only used ‘aws s3 sync …’ to upload files, and I’ve seen ETags “right next to each other” where one uses 8MB and the other uses 16MB, I know this is not a rule. Maybe it’s a “guideline”.
  • Another stackoverflow has the code in python, go, powershell, etc. This article also mentions – but I have not tried this yet:
aws configure set default.s3.multipart_threshold 64MB
  • Pypi has a page that talks about defaults of 5MB, 8MB, 15MB and 16MB
  • This teppen.io post has some information (but the description doesn’t agree with any S3 documentation)
  • This savjee.be post has the implementation in Bash.

Posted in Software Engineering, Storage | Comments Off on Amazon S3 ETag Advanced Information

Dell 3020 SFF

The goal of this machine was to buy a complete (monitor included) machine for less than $300, taxes and shipping included. It was refurbished, from Blair Technology, “on sale” from $480. It arrived in a big box that contained two other big boxes, and was very well packed. The NewEgg page claims its “Date First Available” is “August 25, 2021”. The CPU was “First Seen” in 2013.

Note that Windows 10 Pro OEM itself sells for approximately $145.

Some facts on the CPU: the Core i5-4570@3.2GHz is currently #1103 on PassMark [5,175] and is a LGA1150 at 84W. It is a 4 core 4 thread model with turbo speed of 3.6GHz.

All product links are from the actual vendor.

ItemProductCost
SystemDell OptiPlex 3020 SFF Computer$279
CPUIntel Core i5-4570 3.2Ghzincl.
RAM8GB (2x 4GB) DDR3 DRAMincl.
Motherboardincl.
Power Supplyincl.
VideoIntel graphicsincl.
DVD/CDThin form factor DVD (broken)incl.
Caseincl.
SSD Drivenone
HHD Drive500GB WDincl.
MonitorAcer 22″ LCD, DVI and VGAincl.
Wi-FiUSB wi-fi adapterincl.
Keyboard/MousePhillips wireless bluetooth keyboard and mouseincl.
OSWindows 10 Proincl.
Software
Total$301
Posted in Computer Builds, Hardware | Comments Off on Dell 3020 SFF

ASUS ROG

This is a prebuilt machine purchase. Asus Model G15CE-B9. (Price comparisons: newegg is selling it for $1,944 via Coldriver20; amazon is selling it for $1,815 via J-Tech Digital).

Some facts on the CPU: it is currently #202 on PassMark [21,574, single thread 3,376] cpubenchmark.net. For a “bundle” CPU, this is pretty good.

All product links are from the actual vendor. More vendor information to follow on RAM/MB/SSD/HDD/etc.

ItemProductCost
MachineASUS – ROG Gaming Desktop Model G15CE-B9, 2021 release$1,649
CPUIntel Core i7-11700F 11th Gen 2.5Ghz LGA1200 65W Eight-Core Desktop, 16 threads($310)
RAM16GB (2 x 8GB) DDR4 DRAM Desktop Memory
Motherboard ASUS LGA 1200 Intel
Power Supply Bronze certified
VideoASUS RTX 3070 ($500+)
CaseASUS Mid Tower Case (Black)
SD Drive512GB
HD Drive1000GB
BD/DVD/CD-none-
KeyboardUSB
OSWindows 11 Home($127)
Total

Posted in Computer Builds | Comments Off on ASUS ROG

Video Card Relative Benchmarks

Name Bench mark CUDA cores Tensor cores Notes
RTX 4090?16,384?24GB GDDR6X $1600 2022/Sep
RTX 4080?9,72830416GB GDDR6X $1200 2022/Sep
RTX 4080?7,68024012B GDDR6X $900 2022/Sep
RTX 3090Ti29,34210,75233624GB GDDR6X
RTX 3080Ti27,25010,24032012GB GDDR6X $900 sale 2022/Sep [$1200 at launch 2021/May]
RTX 3070Ti23,5876,144192
RTX 307022,2665,888184$500 release price 2020/oct, sold out still in 2021/nov
RTX 3060Ti20,3984,864152Nvidia founders edition $400, in stock 2022/Oct
RTX 208018,685
RTX 207016,2392560320$500 as of 2021/may
GTX 108014,800
GTX 9709,639
RADEON 5706,967
GTX 1050Ti6,332$240 release price;
$300 in 2021/nov
GTX 750Ti3,921
GTS 250613
GeForce 8400GS115

Source: https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+RTX+3070&id=4283

Posted in Hardware | Comments Off on Video Card Relative Benchmarks