[LINK] The "health" record security model

Tue Nov 13 10:52:20 AEDT 2018

David wrote:

But the problem with MHRecord lies in it's unknown objectives

Please explain what you imagine these "unknown objectives" might be in
concrete language and how they might hurt me.   It sound very like fairies
at the bottom on the garden talk.  Sorry, goblins.

> Longitudinal studies have to be reasonably well-controlled to be reliable,
> and a collection of random PDFs is unlikely to cut it.

Longitudinal studies are not actually controlled studies, they're different.

I'm not 100% sure what sense you are using the word "random" here unless it
is just a generalized pejorative.  The data in MyHR is not complete.
However, completeness is rare in experimental data sets in medical science,
and in science generally.  A slew of statistical methods has been developed
to deal with incomplete data sets.  Google and Facebook have been
incredibly economically successful working with incomplete data sets,
however, their primary objective is to sell stuff, not improve population
health.  Big data has been incredibly successful in lots of areas and there
is no good reason to think it won't work in health science - or health
economics.  As a matter of fact, big data is already being used
successfully in health, go look.

PDFs  also appears to be a pejorative term here.   Just so you or anyone
still tuned in knows, I'll explain it:  The basic reasons why PDFs were
used is that it is the existing system.  Doctors look at text records.  It
is what thousands of bits of healthcare software in hospitals and labs
produce.  It's the format that gets checked and approved.  Ideally, from an
abstract data perspective at least, health records would use some kind of
structured xml-like format, clearly and unambiguously.  There are two
primary problems; the scale of change on the source side, and, creating the
data standards.  There is no unified common standard for naming medical
symptoms or diagnoses.  Names change from place to place.  Standardisation
requires doctors to change the names of their diagnoses.
Similarly, medical testing is done differently from place to place using
different standards and different equipment.  It is often annotated to
indicate problems with a sample or an interpretation.  The process has
multiple checks to ensure reliability, culminating in check and sign off of
the final text by a senior clinician.  The clinician does not sign off an
xml data set and they would be rightfully wary of having their signoff to
an xml dataset.  There are ongoing moves to standardisation and abstraction
of data from presentation but these are slow and careful processes that
will take years.  We are stuck with PDFs for some time.

Do PDFs present a problem for researchers?  Yes.  Do they think they can
handle it?  Yes.  If Google can reliably determine street numbers in all
kinds of formats from photos, extracting a particular data element from a
PDF blood test will be relatively easy.  The data doesn't have to be
perfect; real world datasets are not perfect.   What researchers are
excited by is the numbers.  Rather than running an expensive longitudinal
study or RCT over a few hundred participants that struggles to achieve
statistical significance they are looking at the n=100 000 or 5 000 000
real world trials.  The data is of course different, weaker in many
respects but stronger in others.  Meshing epidemiological studies with
trails is normal in medical science but we can expect to see more good
epidemiological studies.  Epidemiological studies are highly regarded in
medical science for very good reasons that I won't go into but you can
check this if you are interested.

Jim