Tuesday, 23 August 2016

A few comments on hash-codes

Apropos of my comments on databases, there is something that any extensive database must have, and that is a "guaranteed" unique key associated with the root entry for any set of related data - a unique account number, if you like.

There are, of course, ways and means to achieve something that approximates this and most of those means use things like MD5 hashes or Global Unique Identifier Numbers.

The latter is a 128-bit number expressed as 32 hexadecimal digits. They are generated with a small but finite chance of collision, but for most purposes, they are effectively unique.

But, the question, is how does one go about generating these numbers?

The answer is simply - complicated mathematical functions.

Now, for the purpose of having a unique identifier that is automatically generated and that will be associated with the root entry for each accession in our Database of Curiosities, and which will remain, no matter what else we change at a later date, it is possible to come up with a relatively simple way of hashing some kind of data (called a message digest).

In this case, the entire primary entry is combined with a date and time string and the (supposedly unique) accession number, and then mangled in order to produce a 128-bit code.

It doesn't matter that the code is meaningless to your eyes, it is a reference number that should be effectively unique in the world. If you then prefix that code with a hexadecimal representation of the accession number, then you have an excellent chance that there will be no collisions anywhere or at any time.

Since we are not using this method to generate data integrity checksums, nor are we using it to encrypt data, then there is no need for the complexity of the MD5 or GUID generation code.

I constructed this function in Excel for the purpose of testing, and so here is the code:

(apologies for the long lines)

' Simple hashing function to generate a unique record identifier which will not collide with other
' record identifiers.

Public Function HashRecord(strIn As String, strTime As String, strID As String) As String
    ' strIn -       The data string to be hashed.
    ' strTime -     A string representation of the date and time of the function call.
    ' strID -       A string representation of the accession number
   
    Dim numHash(18) As Integer
   
    strFeed = strTime & strID & strIn & strID & strTime
   
    For i = 0 To 18         ' explicitly zero the working array.
        numHash(i) = 0
    Next i
   
    byCarry = 0
   
    For i = Len(strFeed) To 1 Step -1
        strNext = Mid(strFeed, i, 1) ' extract the next character
        chrVal = Asc(strNext)        ' convert the character into an 8-bit ASCII value
       
        numHash(1) = ((numHash(1) * 8) + byCarry) Xor chrVal
            ' shift the binary bits of the number up 3 bits, add whatever
            ' carry value came from the last calculation and XOR with the
            ' next byte of the string to be hashed.
           
        byCarry = 0 ' the carry value
       
        If numHash(1) > 255 Then    ' calculate the carry, and the value of the first byte
            byCarry = Int(numHash(1) / 256)
            numHash(1) = numHash(1) And 255
        End If
       
        For j = 2 To 16             ' ripple the change up the entire array
            numHash(j) = (numHash(j) * 8) + byCarry
                byCarry = 0
                If numHash(j) > 255 Then
                    byCarry = Int(numHash(j) / 256)
                    numHash(j) = numHash(j) And 255
                End If
        Next j

    Next i

     If byCarry Then    ' if there is still a carry, then wrap it around and add it into the first byte.
         For j = 1 To 16
            numHash(j) = numHash(j) + byCarry
                byCarry = 0
                If numHash(j) > 255 Then
                    byCarry = Int(numHash(j) / 256)
                    numHash(j) = numHash1 And 255
                End If
        Next j
    End If
   
    hexHash = ""
   
    For j = 1 To 16 ' convert to hexadecimal (with leading zeros)
        hexHash = hexHash & Right("00" & Hex(numHash(j)), 2)
    Next j
    hexHash = Left(LCase(hexHash) & "0000000000000000000000000000000000000000000000000000", 32) 

' fix the length, and make the letters lower case
    hexHashX = Left(hexHash, 8) & "-" & Mid(hexHash, 9, 8) & ":" & Mid(hexHash, 17, 8) & "-" & Mid(hexHash, 26, 99) ' break up the string of digits
   
    hexHash = ""
    For i = 1 To Len(strID) ' convert the accession number to hexadecimal
        hexHash = hexHash & Right("00" & Hex(Asc(Mid(strID, i, 1))), 2)
    Next i
   
    hexHash = Right("0000000000000000" & hexHash, 16) ' just 8 characters to be used

    HashRecord = Left(hexHash, 8) & "-" & Right(hexHash, 8) & ":" & hexHashX
   
End Function




.
The output looks something like this:


00004D30-30303030:66b531da-92377f48:7d279dcd-6e7d107
The first two groups are the accession number (M00000), the remaining four groups are the digest.


Of  course, it is quite possible that I will simply end up using something much simpler.

Monday, 22 August 2016

The Curious Curator

Introduction: A little history

I have, in the past, made oblique reference to my main hobby which is, contrary to most beliefs, not based in computing or electronics.

I have to confess, here and now, that my passion is for minerals and all things mineralogical.

I live close to an area which has hosted all manner of metalliferous (and non-metalliferous) mining for centuries - an area with a rich industrial history, and even after a century of dereliction, a host of minerals to find and to collect.

For anyone who collects anything at all with more than a casual seriousness, an important part of that collection is the keeping of records.

Going from a simple stock-book to a simple electronic card-index took time (and lots of cramped fingers). Advancing from a card-index program to a database took even more time. The database I chose was what was available to me - Microsoft Access, as a part of the Office 97 suite.

That edition of Access is now getting somewhat long in the tooth, now, and is not completely happy running under Windows 7 - I cannot, though, justify spending money on a more recent copy of Access. I can easily justify spending time and effort on migrating to a new database.

Over the past four years, I have played with ideas, fiddled with software and tried stuff out, with the following results -

  • The available Database Management Systems are many and varied - none even approximately capable of the visual form/report design capabilities of Access (forms that require zero programming!)
  • The available database systems, while excellent, are designed for vast amounts of data presented in a strict and inflexible format. Most of them are Relational Database systems.
  • The available database systems store data in table files that are not, at any stretch of the imagination, human readable or even human friendly.
  • Losing a database file will, unless properly backed up, means losing the data within the file - in its entirety.
  • Losing the software in an upgrade, likewise, means losing access to your data.
  • Most database systems can reference external files (documents), and some may even incorporate those documents within the database itself - in their own, human-incompatible format.
My answer to this is a novel database that uses the file-structure of the storage medium (disc drive) to organise text files that contain various indices and data sets (as documents), much as physical documents would be stored in a museum's collection records.

Thus we have the first inkling of the
Curious Curator for a Cabinet of Curiosities
.

The logo is a crown, in fact one drawn well over a century ago by John Tenniel as a part of an illustration in a children's story book.

Why a crown, and that one in particular? A favourite quotation of the young girl who wore it.

"Curiouser and curiouser," said Alice.

The whole tenet of the program is that each and every document is stored in a manner that would allow any system accessing the data to be able to retrieve it without recourse to any particular piece of software in order to extract that information from the file.

Other than media files (images, video etc.), everything is stored as plain text.

Whilst this is not the most efficient manner of storing data, either space-wise or for access speed, it is robust in the extreme. Damage to a single file loses one record-card's worth of data (which may be able to be recovered in seconds from a master table - which is not easily accessible to the human reader)

Documents, file cards, image galleries and so on could be simply constructed using simple tools and text files.

While the software is still in the planning stage, I have broken ground on it - having decided on the programming language, platform and delivery medium. I have also managed to get together in my mind the various tools and techniques that will be brought to bear.

At some point, the project will be available for download, comment, testing and piracy from a project page on SourceForge.

And, as most things I do are - the tools and software I will be using are all open source, as will be the Curious Curator Database.

https://sourceforge.net/projects/curious-curator/

Just because I happen to like the metaphor of filing cabinets, card indexes and manila folders in an old museum office, here is a quick and dirty, preliminary mock-up of one of the pages I have planned ...


And just in case you don't understand the reference to the Cabinet of Curiosities - that is the early name given to private collections of objects (curiosities or curios), which 'cabinets' (often suites of rooms given over to them) eventually became the museums we know today.

Friday, 6 May 2016

Portable Power - recycled Lithium-Ion batteries

While considering portable equipment, one of the major considerations is always the power supply.

It has long been understood that portable devices are cheaper to run from a rechargeable battery. The most energy efficient technology, at the moment, is Lithium Ion.

Now, when working with lead-acid cells, charging and using the battery is simple - current in charges, current out discharges. The only real considerations are that overcharging reduces the life expectancy of the battery, and deep discharge can damage individual cells of a battery.

Nickel-Cadmium (NiCd) and Nickel Metal-Hydride (NiMH) require a bit more care, in that overcharging can do a lot of damage to the cells, deep discharge can do the same, and there is often quite a bit of heat involved with re-charging damaged cells.

Lithium Ion (Li-Ion) and Lithium-Polymer (Li-Po) cells are even more tricky in that they are extremely intolerant of any kind of charging/discharging abuse at all. Since they have an organic electrolyte, they often react to physical damage or overcharging by bursting into flames.

On the plus side, they have a particularly high energy density, are available in all manner of shapes and sizes, and they have an output voltage in excess of 3V (typically in the region of 4.1V at full charge.)

Just to make the Li-Ion cell more attractive, it is possible to obtain miniature charger modules for them, and tiny inverter units so that they can be charged with a standard 5V power supply (such as a USB socket on your computer), and can deliver whatever voltage you need at a fairly hefty current - and with a better than 90% efficiency.

The better charger modules even disconnect the load if the battery is close to going into deep discharge.


I found myself, some time ago, in possession of four thin, flat pouch cells giving a total storage of several ampere-hours. On their own, they were too delicate to use without a case, and I didn't have room in my project case to install them. Additionally, I didn't want my Li-Ion batteries to be enclosed in a plastic case ... just in case of fire.

I ended up using the bottom covers of a couple of old CD-ROM drives to make the PSU chassis, holding the batteries in place using neoprene foam tape, double-sided fixative for the circuitry and ordinary PVC electrical tape to hold the top and bottom covers in place.

This was still not sturdy enough, so I put together the top and bottom covers from an old (burned out) CB, using epoxy to join the two clam-shell halves, mounting a polycarbonate base on the resulting case, and a matching face-plate for the other end.

The face plate was glued to the battery chassis and the chassis slid into the clam-shell case - the whole was secured using that same PVC electrical tape.

The result is a fairly heavy battery pack that is armoured, pretty close to fireproof and can be opened if required (but is essentially permanently closed). It is also waterproof (always a good thing in field equipment)

And the output? 5.0V at up to 2A.

For the builders of  home-grown portable equipment, I would always recommend putting your Li-Ion or Li-Po battery pack in a separate, steel or thick aluminium case, just in case of fire.

A final couple of notes on Lithium rechargeable cells:

Never, ever charge Lithium Ion or Lithium Polymer cells unattended unless placed in a fire-proof enclosure and away from other flammable materials - they can be unpredictable.

Regularly inspect the cells for inflation (bulging) as this is an indication of cell failure. Such cells should be disposed of properly.



In case of Li-Ion/Li-Po cell fire or uncontrollable overheating - place the entire, disconnected battery, complete, in a metal bucket filled with cold water (preferably with a little salt in the water) and put the bucket somewhere outside, away from flammable material, people and animals. This will cool the battery and discharge it. Leave it there for at least 72 hours, and assume that it is going to catch fire if you bring it out before then. Whatever circuitry is attached to the cell(s) may be assumed lost.

Likewise, if you have a Li-Ion/Li-Po cell that becomes punctured (they smell lovely), drop it in a bucket of slightly saline water and put it outside.

Once safe, it can be retrieved and disposed of through an appropriate route.




Wednesday, 27 April 2016

Smart VDU

One of the biggest problems with producing field data is the display of that data.

When it is just temperature and humidity, then a simple LCD text display will do, but once you have a big stack of data to summarise, then you need something a little more complicated.

The Graphical LCD to the rescue.

These quarter VGA displays in high colour and with suitable libraries installed are capable of displaying text and graphics with little hassle. On something like a Raspberry Pi, you will be able to display a complete graphical interface.

On something smaller, like the Arduino, you are much more limited. With a mere 32kB of program space, you have to decide what you really need to support. Happily, low-end Arduino is cheap enough to be able to use one as a fairly simple terminal device, controlling a colour VDU type display, interpreting a series of external commands.

It has taken numerous experiments over the past few months to get to the point where I can begin to specify a simple terminal device for the Open Sensorium Field Buddy project.


Two types of display.

There are two basic types of text display - the teletype and the smart VDU.

The teletype display, a dumb terminal, simply takes serial data and adds it to the bottom of the screen, scrolling the page up at the end of the line. Glass teletypes were once the height of cool for user interfaces.

The smart VDU, on the other hand, is able to accept a series of commands that allow the screen to display a page of data, and to update it in place without scrolling.

The standard for this type of VDU is the VT52 standard (from the DEC VT52 terminal), and later the VT100, with a few more advanced features.

Given the speed of an Arduino or similar micro controller, and given the limited program storage, the VT52 feature set is just about do-able. Yes, it is possible, but ugly and unreliable. The problem is that commands may be embedded anywhere in the data-stream. Line ends are not necessary and you need to be able to handle malformed commands. The addition of a dumb-mode scrolling screen makes everything even more cumbersome.

NMEA 0183 as a compromise.

There exists a standard that could be the answer. The NMEA 0183 standard [https://en.wikipedia.org/wiki/NMEA_0183] sets out a specification for both hardware and message format protocols for communication between various maritime instruments. Each message packet (sentence) begins with a message code and ends with a checksum - and the packet length is limited to 80 characters (including the end-of-line CR-LF sequence).

There is no NMEA sentence which is designed for the purpose of controlling a VDU, but there are plenty of unused mnemonics that could be used. Since this is NOT an official project for maritime use and is NOT authorised by the NMEA [http://www.nmea.org/], you should be aware that we are using their proprietary standard as a framework for an unassociated, amateur purpose.

CAUTION: You should not use this project in conjunction with any maritime equipment upon which you rely for navigation or with other safety-related instrumentation. To do so may render your existing installation unsafe or inoperative.

In addition to the use (or abuse) of the NMEA message format, the Arduino and similar microprocessors will need some additional help - in the form of hardware handshaking. The UART buffer size on a 32kB machine that runs at 8 or 16 MHz is just not able to handle a continuous stream of data, even at 4800 baud (as specified by the standard).

Message Format.

The standard NMEA sentence has the following format:

$AAAAA,field, field, .... *FF
where AAAAA contains a sentence type and an originating device.
A series of text fields may contain numeric or textual data (depending on the sentence type)
*FF is an 8-bit hexadecimal checksum calculated for everything between the $ and * signs.
are carriage return and linefeed characters used to delineate the end of a line of text on a teletype.

For the purposes of VDU control, the sentence structure will be as follows:


$xxVDU, Command, Field, Field, Field, Field, Field, Field, Text *FF

The xx in the sentence header denotes the originator of the message, but is irrelevant for the purposes of the VDU - and is included purely for debugging when using a simple serial display on a host computer.

The command mnemonic tells the VDU what to do with the rest of the data

The fields (six in all) are the parameters used by the command

Text is the text to be displayed (if any) as part of the command. This is limited to 40 characters by the program, but is physically limited to whatever the width of the display happens to be. Null fields must still be supplied, as the line is a fixed number of comma separated values.

Examples:

Write text:
    $GPVDU, Wri, 4.5, 3,,,,,Some text data*FF
will write "Some text data" at line 4.5 and beginning at column 3.

Draw line:
    $GPVDU, Drw, 5.0, 2.5,5.0,16.5,,,*FF
will draw an underline below the above text, staring a half-character before and ending a half character after the "Some text data".

Draw Glyph:
    $GPVDU, Gly0, 4.5, 17,5,,,,*FF
will draw glyph number 5 shortly after the text above. Glyphs (simple bitmaps) are treated as additional characters.  Facility for different widths of glyph is taken care of in the program.

Change Colour Scheme:
    $GPVDU, Csch, 15, 0x0000,0x8410,0xFFFF,0x31CA,,*FF
will change the custom colour scheme to grey on black with bright and dim shades for highlighting.

Set LEDS:
    $GPVDU, LED, 0x8,0x1,,,,*FF
will switch on LED 4 and switch off LED 1. Up to 8 LEDs may be specified (although 4 is standard).


Font:

In order to simplify the software, a fixed-width font is specified.
Bold and underline highlights as well as bright, normal and dim brightness gives a wide range of highlight options, along with the possibility of using additional colours for text/background.

Using the character cell based line drawing, forms and dividing lines may be generated.

One odd characteristic of this interface is the non-integer line and column numbers - this allows for a half-space to be used to separate data on the screen rather than wasting a full line/column.

Handshaking:

A simple Request To Send - Clear To Send handshaking should be used. The CTS line should be held low only until the first byte appears in the buffer. The sending device will transmit one complete sentence, and will then pause and wait for another CTS to be signalled.

Buttons:

The capacity to transmit codes associated with a small number of buttons exists. Since each button is represented by a single byte, transmit handshaking is not required for transmitted data.