This is a plan to invite people who are active in various fields related to sound and talk about various themes about "sound" by our chief scientist. In the third installment, we invited the audio visual critic Rei Asakura, the president of our president, and talked about the sound research and development of FINAL that created the ZE8000. Part 2 says, "Is the criminal adhesive? ! "is.
↓ You can also see the video from the following.
Hemp villas: It is the impression that I listened to ZE8000, but it is really natural. There is no artificial emphasis, strengthening the outline, or making it donchari. Then the ambient, or a microscope, comes out very well. The wholes -tone after the sound rang came out as much as I thought, "Is it so contained?" The sound of the sound is fast, the sound comes out clearly. It's a bit different from what it's clear even if it's clear, and there are many products that have emphasized a crunchy feeling in TWS earphones so far, but this is a very outline. I can't see the emphasis.
Especially the bass is good. Until now, the base scale has been emphasized with an artificial feeling. There were many basses, but I didn't know the contents of the bass. But I can understand this ZE8000. Moreover, there is a three -dimensional bass volume there. In addition, the scale is good. I was impressed there. Can you tell me some technical factors that make such a sound?
Beach: One is that the low frequency range rises when the air room is closed to waterproof (see Vol.1). The story that comes out hereMasking [1]is. Masking makes the low frequency band sound hindered the sound of the frequency band, so when the low range rises, it becomes difficult to hear the middle and high frequencies. Then, what to do is raise the high range. The generally said Don Shari is born. However, in the bass record that you have just mentioned, you make a sound by standing a microphone or getting a line output. We are working to increase the resolution of the so -called heavy bass.
In order to make it easier to distinguish the scales that Professor Asakura mentioned, for example, a wood -based attack feeling when playing the strings with fingers is important, but this actually affects the characteristics of the mid -high range. In the recording, the scale is adjusted so that the scale can be heard well, but if you listen to what is adjusted like that, if you listen to it with a characteristic that is too donated, it will be difficult to hear the scale. Or the problem was that the feeling of playing was too strong.
Hemp villas: Unnatural.
Beach: In the ZE8000, the major factor is that the low range is exciting and the high frequency is raised, so it is a major factor to say, "Let's stop playing with frequency characteristics to avoid masking".
Fucking: When I tried to do digital signal processing, I was initially performed various experiments based on a PC. That's a great effect, but it can't be used in mass production. This is because the error when mass -producing a driver unit is about ± 3dB for the low -frequency Arou Eththlete (error tolerance) even if you do your best. This is common sense from production, and even if you set it to ± 1db, you can't do that unit very much. In order to solve it, I have been trial and error since 2016. Among them, it is a manufacturing method that does not use adhesive to adhere to the diaphragm and edge.
Hemp villas: The adhesive was the culprit!
Fucking:that's right. Driver units are used in large amounts of adhesive. No matter how light and excellent materials are used on the diaphragm, the heaviest is the adhesive. After that, the drawer line from the voice coil is paste with an adhesive to the edge. This is absolutely necessary to prevent disconnection. Then, the diaphragm must be accurately piston motion, but it has some adhesive, so it naturally runs rolling. I have to do it here, but I can't do that easily. It took 4 to 5 years.
Hemp villas: It's a manufacturing common sense. That's a matter of course.
Fucking: It's natural.
Hemp villas: I did not accept the error.
Fucking:that's right. Until now, there may be people who have thought of the same thing so far, but I think that in many cases, signal processing is used for the correction. In fact, signal processing is not used for positive things, but also used to correct bad things. Since all parts have an error, the more detailed the correction, the worse the more corrections due to the error. Many people who are involved in manufacturing say, "If you do signal processing, the sound will worsen."
Hemp villas: Did it take time to review it thoroughly?
Fucking:that's right. Making a diaphragm without adhesive will soon come out as an idea. However, it took years to actually do it, and the production machine itself has been changed.
Hemp villas: I achieved it for many years.
Fucking:I agree. It was that it took many years to happen to be used in the development of ZE8000.
Hemp villas: The innovation of the diaphragm and the signal processing or the electrical system innovation were running in parallel, which was together.
Fucking:that's right. When playing with frequency characteristics by signal processing, it is 0.2dB, or if there is a 3DB error in the unit manufacturing, it will not be talked about. That's pretty difficult.
The people who make so -called driver units are mechanical people called "mechanical shops". Signal processing is another field, so it is quite difficult to improve it together with the same purpose as an ordinary company. That was realized because the company was small, and I am thinking of the mechanism and the whole picture of the product. And Hamazaki is a digital signal process. With two people, you can adjust the details to a fairly detailed point. That's where this is good.
Hemp villas: Another thing that impressed me after listening to ZE8000 was a microscopic signal. Depending on the product, the sound like a whole tone disappears immediately. This is common in speakers, but the ZE8000 has a very detailed sound, so you can see how this sound is and how it disappears in the air. You can see it. Because it is an earphone, it is inevitable in the head, but even if it is a head of the head, the depth and size of the sound field can be clearly understood. I think that it is because it is properly managed to a very detailed point, and it has a high performance to regenerate to a microscopic signal. Perhaps one of the factors is that the improvement of the driver you mentioned has greatly affected.
Fucking:I agree. The distortion is less than one -tenth than the conventional product. Moreover, it is usually 100 Hz or more, but ZE8000 has a distortion rate of 100 Hz or less from 1/10 to 1/120. I think it is obviously contributed.
Hemp villas: Somehow, the sound of the sound source is out. It's a completely different tone reproduction from the TWS earphone I've heard so far, so I'm confused at first, but when I listen to it well, it's correct. If you think it's right, you'll want to listen to your sound sources this time.
What surprised me was that I was teaching music at university, and at that time I used MP3 sound sources. When you check it, listening to other earphones is obviously not better than linear PCM, but this is a lot of compressed sound sources. I was a little surprised at the expressiveness of the tone. The story mentioned earlier was a driver unit, but what was the point that was particularly focused on signal processing?
Beach:I agree. The research and development of signal processing in the ZE8000 came with an engineer with a digital audio signal processing called Yuyama, but it was a bit unusual to talk with him after the development of ZE8000, but "Signal processing is mechanical. This is the first case that was not used for wiping the buttocks of the systems, "and two people. As Hosoo said, until now, there is a limit to the acoustic acoustic, and for example, the problem of adhesive has a lot of low -frequency distortion. Therefore, even if the signal processing is performed elaborately, the sound quality changes depending on the characteristic error of the driver, so there is no point in doing it elaborately.
So, as the driver's Arou Wance went up, it was possible to develop the signal processing that we wanted to do on a pure white paper with a truly 0.1dB incident. I was thoroughly driven by signal processing just to improve the tone. This is the first time that Yuyama, who had worked for a major manufacturer, was the first time, and I was a very interesting first experience.
Hemp villas: In other words, for the first time, I wanted to do the independence of signal processing, saying, "Let's help with signal processing, assuming a driver with a low linearity.
Beach: I guess that was the biggest in signal processing development.
Hemp villas: There is a reverse correction in one theory of signal processing. If it is such a characteristic, it will be fixed by using the opposite signal. Does that mean you have become more precise?
Beach:I agree. The basics use a signal process called digital filter. The FIR (limited impulse response) filter [2] and IIR (infinite impulse response) filter [3] are adjusted to adjust. It means that this process was performed elaborately. After that, there is a limit to the fact that the algorithm of the signal processing is, of course, how much the battery capacity can be used for that process, and how much processing used will allow. What and how and how and how and how and what? I am quite struggling to adjust this signal processing. I think it was great that it was realized.
Hosoo: In this development, we were able to clearly share the characteristics we aim for. Even if you adjust the reverse correction earlier, no matter how much you do it, the direction will not be determined unless you have the characteristics you are aiming for.
Hemp villas: Was the characteristics of the goal very different from before?
Fucking: It was so different that we asked ourselves to see if this was fine.
Hemp villas: Well, what kind of place did Hamazaki join this company and particularly particular about the development of ZE8000?
Beach: Earphones and headphones have a basic theory. Reproduce the same state as when listening to the speaker with earphones and headphones. That is the conventional theory. This time, in a sense, we stopped comparing with speakers. I just asked for "What is the best characteristics for earphones?" As a result, the customer may have been confused. If it is perceived by plus, you will end up with the good sound of Mr. Asakura.
Hemp villas: The teacher's teachers have the sound and sound field of the speakers so far, but this time they are all completed in the earphones. What specific differences come out?
Beach: To make a speaker a teacher, it considers how the sound propagates from the speaker to the ears. Wrap it with your head or reflect with your ears and shoulders.
Hemp villas: Head communication function Note [4].
Beach:Yeah. What this is doing is the so -called Hats (head and torso simulator [5]), which uses a mannequin with a microphone on the eardrum part, which adopts a head transmission function. That's right. However, the head transmission function by Hats is different from me, and is different from Professor Asakura, so to speak, the average of many people.
Hemp villas: It's a standard mannequin.
Beach: Until now, I had no choice but to do so. This time, I put it on the side and started what humans are paying attention to when listening to sounds. That's a big difference.
Hemp villas: In other words, do you stick to listening with your ears without a prerequisite, adjust the optimal sound when you listen to it with your ears and find it?
Beach: I don't adjust the optimal sound. It is just a logical way, and we are doing whether it can be done by that theory. In this development, I found a certain theory about here, so if I have a little more evidence, I would like to make a presentation at the conference. At the moment, there is no enough evidence yet, so I can't say it, but ZE8000 has been made by that theory. Therefore, the sound of ZE8000 is made up of only the theory, so I do not make any correction with EQ for a good anbai. In other words, you asked you about the theory.
Hemp villas: There will be a limit before being announced at the conference, but what is the theory?
Beach: One is that the main point is that the masking is not corrected by the amplitude frequency characteristics of the earphones and headphones. Even if you don't do it, you should be able to create a characteristic that people can feel naturally. With this concept as the starting point, the theory created by research was adopted by ZE8000.
Hemp villas: I've been listening to speakers all the time, but what I feel different from the others is to understand the director's intention in the video. Until now, TV emphasized the original image. The easiest to understand is the outline emphasis. In the age of SD, both prescute and overshoot appear, so it spreads as if the white line bleeds. However, by emphasizing the outline, the 480p world can be seen. When it comes to full HD, 4K, you will get a neat details without the outline. So now it's already a crime to emphasize the contour.
In terms of television world, it is rapidly shifting from a world that has been created in tremendous artificial artificial articles to a natural world. Moreover, it is not just a large amount of information, but the amount of light information is different in recent HDR. In short, we are working hard to reproduce the world we are looking at on a television.
If you think about it, masking occurs with bass, so you have to make the treble squeezed by signal processing, and as a result, it creates a sound that emphasizes the contours that are completely different from the raw sound. However, what I feel in ZE8000 is natural, so there is no need to emphasize the contour. So what comes out is emotions.
I operate a label called Ultra Art Record, and I am making a collection of songs by Shojin Mie on that label. One of the songs I often wear is "Cheek to Cheek". This is a nice and bright song, there is a lot of information, the song is good, and the piano of Tsuyoshi Yamamoto is just right, so I use it as a very good reference. Conversely, the fifth song "You Don't Know Me" is rarely used as a reference. The original is a pretty fun song made by an American country singer named Eddie Arnold, and you don't know what a boy falls in love with a girl, but you don't know at all. 。 However, Ray Charles covers the song, which is an arrangement of sadness. In this album, Mie Shojin sings the Ray Charles version. This song is so emotional that everyone cried, especially when recording this in Pony Canyon Yoyogi Studio in 2017. The song's song is wonderful, but Tsuyoshi Yamamoto's piano is also like a sadness x 2.
Emotions are a high -level concept, and to regenerate it, it is useless if you do not have the audio characteristics first. But it doesn't matter if you have only audio characteristics. It's something that pulls out from the back. That was really a ZE8000. This is the first TWS earphone to hear such a sound. I was surprised in that sense.
If you pursue physical characteristics thoroughly, you think that linearity is particularly affected. When I made the linearity properly, I thought that it was the first TWS earphone that I was able to hear music as emotions, rather than simply listening to music as a sound.
Beach: Now that there is a story about the video, it will be a bit of an idea, but when we made a 8K display at the NHK Technical Research Institute, we have taken an orchestra with an 8K camera. is. The camera has never been bread or zoomed. Then, the sound is recorded and played with a 22.2ch 3D sound. Then, if the flute gradually solos, the flute will naturally look. That is much more impressive. Certainly, like a music program, I want to see the backshot of the conductor, but this one can immerse yourself in music. With the same thing, let's emphasize the vocals and emphasize the bass. Perhaps because the listeners want to listen to this part or enjoy music that feels like this.
The transducer is really flat, and what you want to look at and what you want to listen to is to leave it to users. The idea that the manufacturer emphasizes something when you listen to this is that it is more impressive to stop it.
Hemp villas: That's right. Instead of doing something on the circuit and making new things artificially, they convey the dose and information of the source of the source. I often say "emotional", but it is interesting considering the Japanese of emotion and information. In English, it is a completely different word from emotion and information, but in Japanese, emotions and information are "compassion". Information is the information that reports compassion, and the emotion that conveys the heart is emotional. The difference comes out well. In other words, the way of listening to the audio is information, but it is not impressed by information alone, though it is like what is ringing and who is doing it. After all, the impression is emotional and the emotion of the musician. In order to convey the emotion of the musician, the linearity is very important. If there are obstacles in the aisle until you reach your heart, or something weird, your emotions will disappear. Therefore, the greatness of ZE8000 comes from the fact that the information and emotionalness of Omoto have it without coloring or processing at all.
This is more of a high -end audio world idea. The more you go to the high -end world, the more you lose tone control. If the amount of excitement of Omoto is one of the high -ends that can be heard without any damage, ZE8000 can do a very high -end idea and high -end way of listening in that sense. When you ask various sound sources, you can feel the goodness of the sound source, the characteristics, and the place like "I want to convey this song like this." I think that is probably the result of working hard on technology development. So, if the sound changes with so much, I think that users are quite confused.
Fucking:I agree. However, there are many users who are confused by those who like earphones. The sound of the conventional earphone is drawn with contours, as the teacher says. It was because it was based on speakers or because it was based on the averaged Hats. However, the outline is not bad, it is a very fun world, and we are going to continue technical innovation in such a direction.
Since we were in the first place of E3000, there was a reality, and even if we created a high -end headphone called D8000, it was not a so -called promise. However, it was very highly evaluated. We were quite confident that our hypothesis was right. However, the D8000 was only acoustic, so I tried to make that sound, so it was necessary to put in the quantity, which was nearly 500,000 yen. I think ZE8000 may have been able to democratize high -end sounds. People who have been listened to speakers and music fans who are not earphones have a feeling of being accepted. I wonder if listening over time will solve confusion.
Hemp villas: Those who have listened to speakers so far will not have come to the world of earphones yet. I don't even need to come. However, listening to this is a great thing, "It's an extension of the world of speakers," so it may lead to expanding earphones and fans.
Hemp villas: Another interesting thing about the contour story is that when you look at the painting, it is Japanese painting that the outline plays an important role, but recently it is the anime world. The anime world is made of contours and flat surfaces, which is also an art. Conversely, Western painting has no outline. There is no contour, in short, only the details and gradation, and the color difference is drawn between the back and the object. So the contour is not bad. However, instead of putting out on the way, those with outlines come out as they are, and those without contours come out with a three -dimensional sound image even without contours. I think that is the groundbreaking place of ZE8000. So, if you have a outline, you'll be confused at first, but I think there are new directions to discover this sound like this.
>> See more: Part 3 "Visual sound? What is 8k Sound? ]
Masking is one of the phenomena of human hearing. It refers to a phenomenon that a certain sound makes it difficult to hear another sound. The sound that makes it difficult to hear is called a mascar. As one of the factors caused by masking, it can be explained from the process of encouraging the basement membrane in the inner ears by the sound waves that arrive on the eardrum. According to this, if the mascar is pure sound, it is easy to mask pure sounds with higher frequencies than the frequency of the mascar. This phenomenon can be experienced by making it difficult to hear human voices due to the high bass energy, such as traffic noise. Also, when playing and listening to recorded music, if the bass such as bass and kick drums is too loud, it will be difficult to hear vocals and high music instruments, and you can experience the balance of music. 。 Masking is an important hearing phenomenon in digital signal processing, such as being used as a hearing characteristics of Rossy Codec. It is also important to understand masking for music recording.
[2] FIR (finite impulse response) filter
The FIR (Finite Impulse Response, Finish Impulse response) filter is a type of digital filter often used in audio signal processing. The FIR filter is a filter with an impulse response of the limited length, and the output signal is determined by applying delays, weighting, and additional processing to the input signal. The FIR filter is characterized by having a compatible answer, so that the frequency characteristics can be changed without changing the phase of the input signal. The FIR filter plays a very important role in audio signal processing because it can accurately design frequency characteristics. If the target frequency characteristics are clear, the FIR filter is often used. For example, inverted filter processing to reduce noise, or when correcting the frequency characteristics of the hearing point in the sound playback environment with a speaker to any characteristics. The FIR filter has a higher stability than the IIR (Infinite Impulse Response, Infinite Impulse response) filter, and is less likely to be affected by phase distortion because the phase characteristics are linear. However, the FIR filter requires a larger calculation in order to obtain the same frequency characteristics compared to the IIR filter, and the number of FIR filters tap between the input audio signal and the audio signal output. There will be a delay that responds. In the case of a synchronization (lip synching) of the video signal and the audio signal, or in a sound environment where interactive properties such as games are required, it is necessary to pay attention to the delay, especially when using a FIR filter.
[3] IIR (Infinite Impulse response) filter
The IIR (Infinite Impulse Response, Infinite Impulse response) filter is a type of digital filter often used in audio signal processing. The IIR filter is a filter with a feedback structure for filtering the current input value based on past output and input values. The IIR filter is delayed between the input audio signal and the signal output because the calculation amount required to obtain the same frequency characteristics is compared to the FIR (Finite IMPULSE RESPONSE) filter. There is little, and it is easy to implement. In addition, the IIR filter may be suitable for higher -order filters than the FIR filter. However, the IIR filter has the drawback that phase distortion is likely to occur because the phase characteristics are non -linear. The IIR filter is often used by the equalizer (EQ) of audio equipment. With the equalizer using a FIR filter, it is difficult to change the parameter while checking the sound in real time, so IIR filters are often used in the equalizer of audio equipment that requires operation while checking the sound in real time. is.
[4] Head communication function
The head transmission function (Head-Related Transfer Function, HRTF) is a transmission function that indicates how the sound waves that come to a person change depending on the body shape around the head. Therefore, the head transmission function changes depending on the position of the sound source that generates the arrival sound waves. HRTF plays an important role in audio signal processing related to spatial sound. For example, this head transmission function is used for a binaural rendering for reproducing a space impression that can be heard in a 3D audio playback environment where multiple speakers are placed in a three -dimensional arrangement of headphones and earphones. 。 To seek HRTF, put a human in an 響 響 room, generate a measurement signal from a speaker in various directions placed around it, and measure it with a microphone located near a human background or the eardrum. In recent years, human body shape has been calculated from images and scan information, and HRTF has been calculated based on the shape of the boundary elements.
[5] Hats
Hats is an abbreviation of Head and Torso Simulator (a simulator of the head and fuselage) and refers to a mannequin only for the head and torso of the person used in acoustic measurement. HATS is a model with the average head, auric, and chest shape, and is used to estimate the sound wave that comes to the eardrum when a human enters any sound field. Therefore, the external ear canal and the anoperia are imitated especially with high accuracy. Generally, the artificial ears formed by a total oriented microphone for measurement equivalent to the eardrum and a simulator that mimics the ear canal are installed on the left and right of the head, and various measurements can be performed using this micron. Masu. Hats is one of the important measurement devices in the development and evaluation of audio equipment and acoustic systems, and is also used for research and development of earphones and headphones.
↓ You can also see the video from the following.
Hemp villas: It is the impression that I listened to ZE8000, but it is really natural. There is no artificial emphasis, strengthening the outline, or making it donchari. Then the ambient, or a microscope, comes out very well. The wholes -tone after the sound rang came out as much as I thought, "Is it so contained?" The sound of the sound is fast, the sound comes out clearly. It's a bit different from what it's clear even if it's clear, and there are many products that have emphasized a crunchy feeling in TWS earphones so far, but this is a very outline. I can't see the emphasis.
Especially the bass is good. Until now, the base scale has been emphasized with an artificial feeling. There were many basses, but I didn't know the contents of the bass. But I can understand this ZE8000. Moreover, there is a three -dimensional bass volume there. In addition, the scale is good. I was impressed there. Can you tell me some technical factors that make such a sound?
Beach: One is that the low frequency range rises when the air room is closed to waterproof (see Vol.1). The story that comes out hereMasking [1]is. Masking makes the low frequency band sound hindered the sound of the frequency band, so when the low range rises, it becomes difficult to hear the middle and high frequencies. Then, what to do is raise the high range. The generally said Don Shari is born. However, in the bass record that you have just mentioned, you make a sound by standing a microphone or getting a line output. We are working to increase the resolution of the so -called heavy bass.
In order to make it easier to distinguish the scales that Professor Asakura mentioned, for example, a wood -based attack feeling when playing the strings with fingers is important, but this actually affects the characteristics of the mid -high range. In the recording, the scale is adjusted so that the scale can be heard well, but if you listen to what is adjusted like that, if you listen to it with a characteristic that is too donated, it will be difficult to hear the scale. Or the problem was that the feeling of playing was too strong.
Hemp villas: Unnatural.
Beach: In the ZE8000, the major factor is that the low range is exciting and the high frequency is raised, so it is a major factor to say, "Let's stop playing with frequency characteristics to avoid masking".
Fucking: When I tried to do digital signal processing, I was initially performed various experiments based on a PC. That's a great effect, but it can't be used in mass production. This is because the error when mass -producing a driver unit is about ± 3dB for the low -frequency Arou Eththlete (error tolerance) even if you do your best. This is common sense from production, and even if you set it to ± 1db, you can't do that unit very much. In order to solve it, I have been trial and error since 2016. Among them, it is a manufacturing method that does not use adhesive to adhere to the diaphragm and edge.
Hemp villas: The adhesive was the culprit!
Fucking:that's right. Driver units are used in large amounts of adhesive. No matter how light and excellent materials are used on the diaphragm, the heaviest is the adhesive. After that, the drawer line from the voice coil is paste with an adhesive to the edge. This is absolutely necessary to prevent disconnection. Then, the diaphragm must be accurately piston motion, but it has some adhesive, so it naturally runs rolling. I have to do it here, but I can't do that easily. It took 4 to 5 years.
Hemp villas: It's a manufacturing common sense. That's a matter of course.
Fucking: It's natural.
Hemp villas: I did not accept the error.
Fucking:that's right. Until now, there may be people who have thought of the same thing so far, but I think that in many cases, signal processing is used for the correction. In fact, signal processing is not used for positive things, but also used to correct bad things. Since all parts have an error, the more detailed the correction, the worse the more corrections due to the error. Many people who are involved in manufacturing say, "If you do signal processing, the sound will worsen."
Hemp villas: Did it take time to review it thoroughly?
Fucking:that's right. Making a diaphragm without adhesive will soon come out as an idea. However, it took years to actually do it, and the production machine itself has been changed.
Hemp villas: I achieved it for many years.
Fucking:I agree. It was that it took many years to happen to be used in the development of ZE8000.
Hemp villas: The innovation of the diaphragm and the signal processing or the electrical system innovation were running in parallel, which was together.
Fucking:that's right. When playing with frequency characteristics by signal processing, it is 0.2dB, or if there is a 3DB error in the unit manufacturing, it will not be talked about. That's pretty difficult.
The people who make so -called driver units are mechanical people called "mechanical shops". Signal processing is another field, so it is quite difficult to improve it together with the same purpose as an ordinary company. That was realized because the company was small, and I am thinking of the mechanism and the whole picture of the product. And Hamazaki is a digital signal process. With two people, you can adjust the details to a fairly detailed point. That's where this is good.
Vibration plate cross section
The diaphragm and edge (made of special silicon) are integrated.
The wiring to the coil also uses aerial wiring. Thoroughly eliminates elements that hinder accurate piston motion
The wiring to the coil also uses aerial wiring. Thoroughly eliminates elements that hinder accurate piston motion
Hemp villas: Another thing that impressed me after listening to ZE8000 was a microscopic signal. Depending on the product, the sound like a whole tone disappears immediately. This is common in speakers, but the ZE8000 has a very detailed sound, so you can see how this sound is and how it disappears in the air. You can see it. Because it is an earphone, it is inevitable in the head, but even if it is a head of the head, the depth and size of the sound field can be clearly understood. I think that it is because it is properly managed to a very detailed point, and it has a high performance to regenerate to a microscopic signal. Perhaps one of the factors is that the improvement of the driver you mentioned has greatly affected.
Fucking:I agree. The distortion is less than one -tenth than the conventional product. Moreover, it is usually 100 Hz or more, but ZE8000 has a distortion rate of 100 Hz or less from 1/10 to 1/120. I think it is obviously contributed.
Hemp villas: Somehow, the sound of the sound source is out. It's a completely different tone reproduction from the TWS earphone I've heard so far, so I'm confused at first, but when I listen to it well, it's correct. If you think it's right, you'll want to listen to your sound sources this time.
What surprised me was that I was teaching music at university, and at that time I used MP3 sound sources. When you check it, listening to other earphones is obviously not better than linear PCM, but this is a lot of compressed sound sources. I was a little surprised at the expressiveness of the tone. The story mentioned earlier was a driver unit, but what was the point that was particularly focused on signal processing?
Beach:I agree. The research and development of signal processing in the ZE8000 came with an engineer with a digital audio signal processing called Yuyama, but it was a bit unusual to talk with him after the development of ZE8000, but "Signal processing is mechanical. This is the first case that was not used for wiping the buttocks of the systems, "and two people. As Hosoo said, until now, there is a limit to the acoustic acoustic, and for example, the problem of adhesive has a lot of low -frequency distortion. Therefore, even if the signal processing is performed elaborately, the sound quality changes depending on the characteristic error of the driver, so there is no point in doing it elaborately.
So, as the driver's Arou Wance went up, it was possible to develop the signal processing that we wanted to do on a pure white paper with a truly 0.1dB incident. I was thoroughly driven by signal processing just to improve the tone. This is the first time that Yuyama, who had worked for a major manufacturer, was the first time, and I was a very interesting first experience.
Hemp villas: In other words, for the first time, I wanted to do the independence of signal processing, saying, "Let's help with signal processing, assuming a driver with a low linearity.
Beach: I guess that was the biggest in signal processing development.
Hemp villas: There is a reverse correction in one theory of signal processing. If it is such a characteristic, it will be fixed by using the opposite signal. Does that mean you have become more precise?
Beach:I agree. The basics use a signal process called digital filter. The FIR (limited impulse response) filter [2] and IIR (infinite impulse response) filter [3] are adjusted to adjust. It means that this process was performed elaborately. After that, there is a limit to the fact that the algorithm of the signal processing is, of course, how much the battery capacity can be used for that process, and how much processing used will allow. What and how and how and how and how and what? I am quite struggling to adjust this signal processing. I think it was great that it was realized.
Hosoo: In this development, we were able to clearly share the characteristics we aim for. Even if you adjust the reverse correction earlier, no matter how much you do it, the direction will not be determined unless you have the characteristics you are aiming for.
Hemp villas: Was the characteristics of the goal very different from before?
Fucking: It was so different that we asked ourselves to see if this was fine.
Hemp villas: Well, what kind of place did Hamazaki join this company and particularly particular about the development of ZE8000?
Beach: Earphones and headphones have a basic theory. Reproduce the same state as when listening to the speaker with earphones and headphones. That is the conventional theory. This time, in a sense, we stopped comparing with speakers. I just asked for "What is the best characteristics for earphones?" As a result, the customer may have been confused. If it is perceived by plus, you will end up with the good sound of Mr. Asakura.
Hemp villas: The teacher's teachers have the sound and sound field of the speakers so far, but this time they are all completed in the earphones. What specific differences come out?
Beach: To make a speaker a teacher, it considers how the sound propagates from the speaker to the ears. Wrap it with your head or reflect with your ears and shoulders.
Hemp villas: Head communication function Note [4].
Beach:Yeah. What this is doing is the so -called Hats (head and torso simulator [5]), which uses a mannequin with a microphone on the eardrum part, which adopts a head transmission function. That's right. However, the head transmission function by Hats is different from me, and is different from Professor Asakura, so to speak, the average of many people.
Hemp villas: It's a standard mannequin.
Beach: Until now, I had no choice but to do so. This time, I put it on the side and started what humans are paying attention to when listening to sounds. That's a big difference.
Hemp villas: In other words, do you stick to listening with your ears without a prerequisite, adjust the optimal sound when you listen to it with your ears and find it?
Beach: I don't adjust the optimal sound. It is just a logical way, and we are doing whether it can be done by that theory. In this development, I found a certain theory about here, so if I have a little more evidence, I would like to make a presentation at the conference. At the moment, there is no enough evidence yet, so I can't say it, but ZE8000 has been made by that theory. Therefore, the sound of ZE8000 is made up of only the theory, so I do not make any correction with EQ for a good anbai. In other words, you asked you about the theory.
Hemp villas: There will be a limit before being announced at the conference, but what is the theory?
Beach: One is that the main point is that the masking is not corrected by the amplitude frequency characteristics of the earphones and headphones. Even if you don't do it, you should be able to create a characteristic that people can feel naturally. With this concept as the starting point, the theory created by research was adopted by ZE8000.
Hemp villas: I've been listening to speakers all the time, but what I feel different from the others is to understand the director's intention in the video. Until now, TV emphasized the original image. The easiest to understand is the outline emphasis. In the age of SD, both prescute and overshoot appear, so it spreads as if the white line bleeds. However, by emphasizing the outline, the 480p world can be seen. When it comes to full HD, 4K, you will get a neat details without the outline. So now it's already a crime to emphasize the contour.
In terms of television world, it is rapidly shifting from a world that has been created in tremendous artificial artificial articles to a natural world. Moreover, it is not just a large amount of information, but the amount of light information is different in recent HDR. In short, we are working hard to reproduce the world we are looking at on a television.
If you think about it, masking occurs with bass, so you have to make the treble squeezed by signal processing, and as a result, it creates a sound that emphasizes the contours that are completely different from the raw sound. However, what I feel in ZE8000 is natural, so there is no need to emphasize the contour. So what comes out is emotions.
I operate a label called Ultra Art Record, and I am making a collection of songs by Shojin Mie on that label. One of the songs I often wear is "Cheek to Cheek". This is a nice and bright song, there is a lot of information, the song is good, and the piano of Tsuyoshi Yamamoto is just right, so I use it as a very good reference. Conversely, the fifth song "You Don't Know Me" is rarely used as a reference. The original is a pretty fun song made by an American country singer named Eddie Arnold, and you don't know what a boy falls in love with a girl, but you don't know at all. 。 However, Ray Charles covers the song, which is an arrangement of sadness. In this album, Mie Shojin sings the Ray Charles version. This song is so emotional that everyone cried, especially when recording this in Pony Canyon Yoyogi Studio in 2017. The song's song is wonderful, but Tsuyoshi Yamamoto's piano is also like a sadness x 2.
Emotions are a high -level concept, and to regenerate it, it is useless if you do not have the audio characteristics first. But it doesn't matter if you have only audio characteristics. It's something that pulls out from the back. That was really a ZE8000. This is the first TWS earphone to hear such a sound. I was surprised in that sense.
If you pursue physical characteristics thoroughly, you think that linearity is particularly affected. When I made the linearity properly, I thought that it was the first TWS earphone that I was able to hear music as emotions, rather than simply listening to music as a sound.
Beach: Now that there is a story about the video, it will be a bit of an idea, but when we made a 8K display at the NHK Technical Research Institute, we have taken an orchestra with an 8K camera. is. The camera has never been bread or zoomed. Then, the sound is recorded and played with a 22.2ch 3D sound. Then, if the flute gradually solos, the flute will naturally look. That is much more impressive. Certainly, like a music program, I want to see the backshot of the conductor, but this one can immerse yourself in music. With the same thing, let's emphasize the vocals and emphasize the bass. Perhaps because the listeners want to listen to this part or enjoy music that feels like this.
The transducer is really flat, and what you want to look at and what you want to listen to is to leave it to users. The idea that the manufacturer emphasizes something when you listen to this is that it is more impressive to stop it.
Hemp villas: That's right. Instead of doing something on the circuit and making new things artificially, they convey the dose and information of the source of the source. I often say "emotional", but it is interesting considering the Japanese of emotion and information. In English, it is a completely different word from emotion and information, but in Japanese, emotions and information are "compassion". Information is the information that reports compassion, and the emotion that conveys the heart is emotional. The difference comes out well. In other words, the way of listening to the audio is information, but it is not impressed by information alone, though it is like what is ringing and who is doing it. After all, the impression is emotional and the emotion of the musician. In order to convey the emotion of the musician, the linearity is very important. If there are obstacles in the aisle until you reach your heart, or something weird, your emotions will disappear. Therefore, the greatness of ZE8000 comes from the fact that the information and emotionalness of Omoto have it without coloring or processing at all.
This is more of a high -end audio world idea. The more you go to the high -end world, the more you lose tone control. If the amount of excitement of Omoto is one of the high -ends that can be heard without any damage, ZE8000 can do a very high -end idea and high -end way of listening in that sense. When you ask various sound sources, you can feel the goodness of the sound source, the characteristics, and the place like "I want to convey this song like this." I think that is probably the result of working hard on technology development. So, if the sound changes with so much, I think that users are quite confused.
Fucking:I agree. However, there are many users who are confused by those who like earphones. The sound of the conventional earphone is drawn with contours, as the teacher says. It was because it was based on speakers or because it was based on the averaged Hats. However, the outline is not bad, it is a very fun world, and we are going to continue technical innovation in such a direction.
Since we were in the first place of E3000, there was a reality, and even if we created a high -end headphone called D8000, it was not a so -called promise. However, it was very highly evaluated. We were quite confident that our hypothesis was right. However, the D8000 was only acoustic, so I tried to make that sound, so it was necessary to put in the quantity, which was nearly 500,000 yen. I think ZE8000 may have been able to democratize high -end sounds. People who have been listened to speakers and music fans who are not earphones have a feeling of being accepted. I wonder if listening over time will solve confusion.
Hemp villas: Those who have listened to speakers so far will not have come to the world of earphones yet. I don't even need to come. However, listening to this is a great thing, "It's an extension of the world of speakers," so it may lead to expanding earphones and fans.
Hemp villas: Another interesting thing about the contour story is that when you look at the painting, it is Japanese painting that the outline plays an important role, but recently it is the anime world. The anime world is made of contours and flat surfaces, which is also an art. Conversely, Western painting has no outline. There is no contour, in short, only the details and gradation, and the color difference is drawn between the back and the object. So the contour is not bad. However, instead of putting out on the way, those with outlines come out as they are, and those without contours come out with a three -dimensional sound image even without contours. I think that is the groundbreaking place of ZE8000. So, if you have a outline, you'll be confused at first, but I think there are new directions to discover this sound like this.
>> See more: Part 3 "Visual sound? What is 8k Sound? ]
Speech
[1] MaskingMasking is one of the phenomena of human hearing. It refers to a phenomenon that a certain sound makes it difficult to hear another sound. The sound that makes it difficult to hear is called a mascar. As one of the factors caused by masking, it can be explained from the process of encouraging the basement membrane in the inner ears by the sound waves that arrive on the eardrum. According to this, if the mascar is pure sound, it is easy to mask pure sounds with higher frequencies than the frequency of the mascar. This phenomenon can be experienced by making it difficult to hear human voices due to the high bass energy, such as traffic noise. Also, when playing and listening to recorded music, if the bass such as bass and kick drums is too loud, it will be difficult to hear vocals and high music instruments, and you can experience the balance of music. 。 Masking is an important hearing phenomenon in digital signal processing, such as being used as a hearing characteristics of Rossy Codec. It is also important to understand masking for music recording.
[2] FIR (finite impulse response) filter
The FIR (Finite Impulse Response, Finish Impulse response) filter is a type of digital filter often used in audio signal processing. The FIR filter is a filter with an impulse response of the limited length, and the output signal is determined by applying delays, weighting, and additional processing to the input signal. The FIR filter is characterized by having a compatible answer, so that the frequency characteristics can be changed without changing the phase of the input signal. The FIR filter plays a very important role in audio signal processing because it can accurately design frequency characteristics. If the target frequency characteristics are clear, the FIR filter is often used. For example, inverted filter processing to reduce noise, or when correcting the frequency characteristics of the hearing point in the sound playback environment with a speaker to any characteristics. The FIR filter has a higher stability than the IIR (Infinite Impulse Response, Infinite Impulse response) filter, and is less likely to be affected by phase distortion because the phase characteristics are linear. However, the FIR filter requires a larger calculation in order to obtain the same frequency characteristics compared to the IIR filter, and the number of FIR filters tap between the input audio signal and the audio signal output. There will be a delay that responds. In the case of a synchronization (lip synching) of the video signal and the audio signal, or in a sound environment where interactive properties such as games are required, it is necessary to pay attention to the delay, especially when using a FIR filter.
[3] IIR (Infinite Impulse response) filter
The IIR (Infinite Impulse Response, Infinite Impulse response) filter is a type of digital filter often used in audio signal processing. The IIR filter is a filter with a feedback structure for filtering the current input value based on past output and input values. The IIR filter is delayed between the input audio signal and the signal output because the calculation amount required to obtain the same frequency characteristics is compared to the FIR (Finite IMPULSE RESPONSE) filter. There is little, and it is easy to implement. In addition, the IIR filter may be suitable for higher -order filters than the FIR filter. However, the IIR filter has the drawback that phase distortion is likely to occur because the phase characteristics are non -linear. The IIR filter is often used by the equalizer (EQ) of audio equipment. With the equalizer using a FIR filter, it is difficult to change the parameter while checking the sound in real time, so IIR filters are often used in the equalizer of audio equipment that requires operation while checking the sound in real time. is.
[4] Head communication function
The head transmission function (Head-Related Transfer Function, HRTF) is a transmission function that indicates how the sound waves that come to a person change depending on the body shape around the head. Therefore, the head transmission function changes depending on the position of the sound source that generates the arrival sound waves. HRTF plays an important role in audio signal processing related to spatial sound. For example, this head transmission function is used for a binaural rendering for reproducing a space impression that can be heard in a 3D audio playback environment where multiple speakers are placed in a three -dimensional arrangement of headphones and earphones. 。 To seek HRTF, put a human in an 響 響 room, generate a measurement signal from a speaker in various directions placed around it, and measure it with a microphone located near a human background or the eardrum. In recent years, human body shape has been calculated from images and scan information, and HRTF has been calculated based on the shape of the boundary elements.
[5] Hats
Hats is an abbreviation of Head and Torso Simulator (a simulator of the head and fuselage) and refers to a mannequin only for the head and torso of the person used in acoustic measurement. HATS is a model with the average head, auric, and chest shape, and is used to estimate the sound wave that comes to the eardrum when a human enters any sound field. Therefore, the external ear canal and the anoperia are imitated especially with high accuracy. Generally, the artificial ears formed by a total oriented microphone for measurement equivalent to the eardrum and a simulator that mimics the ear canal are installed on the left and right of the head, and various measurements can be performed using this micron. Masu. Hats is one of the important measurement devices in the development and evaluation of audio equipment and acoustic systems, and is also used for research and development of earphones and headphones.