Voice-signature-based Speaker Recognition
Abstract
Personal
identification
and
the
protection
of
data
are
important
issues
because
of
the
ubiquitousness
of
computing
and
these
have
thus
become
interesting
areas
of
research
in
the
field
of
computer
science.
Previously
people
have
used
a
variety
of
ways
to
identify
an
individual
and
protect
themselves,
their
property
and
their
information.
This
they
did
mostly
by
means
of
locks,
passwords,
smartcards
and
biometrics.
Verifying
individuals
by
using
their
physical
or
behavioural
features
is
more
secure
than
using
other
data
such
as
passwords
or
smartcards,
because
everyone
has
unique
features
which
distinguish
him
or
her
from
others.
Furthermore
the
biometrics
of
a
person
are
difficult
to
imitate
or
steal.
Biometric
technologies
represent
a
significant
component
of
a
comprehensive
digital
identity
solution
and
play
an
important
role
in
security.
The
technologies
that
support
identification
and
authentication
of
individuals
is
based
on
either
their
physiological
or
their
behavioural
characteristics.
Live-‐data,
in
this
instance
the
human
voice,
is
the
topic
of
this
research.
The
aim
is
to
recognize
a
person’s
voice
and
to
identify
the
user
by
verifying
that
his/her
voice
is
the
same
as
a
record
of
his
/
her
voice-‐signature
in
a
systems
database.
To
address
the
main
research
question:
“What
is
the
best
way
to
identify
a
person
by
his
/
her
voice
signature?”,
design
science
research,
was
employed.
This
methodology
is
used
to
develop
an
artefact
for
solving
a
problem.
Initially
a
pilot
study
was
conducted
using
visual
representation
of
voice
signatures,
to
check
if
it
is
possible
to
identify
speakers
without
using
feature
extraction
or
matching
methods.
Subsequently,
experiments
were
conducted
with
6300
data
sets
derived
from
Texas
Instruments
and
the
Massachusetts
Institute
of
Technology
audio
database.
Two
methods
of
feature
extraction
and
classification
were
considered—mel
frequency
cepstrum
coefficient
and
linear
prediction
cepstral
coefficient
feature
extraction—and
for
classification,
the
Support
Vector
Machines
method
was
used.
The
three
methods
were
compared
in
terms
of
their
effectiveness
and
it
was
found
that
the
system
using
the
mel
frequency
cepstrum
coefficient,
for
feature
extraction,
gave
the
marginally
better
results
for
speaker
recognition.